I am creating an android application to recognize book in a library. What I do is I will take a image of the book spine of a book and send it to a server to do the image process there and recognize the book from a database and send the details about the book to the phone or if book is not there, it will recognize the optical characters and send it to the mobile application. I am hoping to do the image processing process using C#. The book recognition is done using a template image comparing which are in the database with the sent image. So I need some help figuring out what would be the best approach to do this. I have already researched on some methods such as
Template matching
Pattern recognition
feature recognition
I want to know when it comes to images like books what would be the recommended method which I better follow. And Is there any good APIs for this. I have researched on OpenCV but want to know if there are better APIs. And how can I use OCR when recognizing the book. I want application to be fast. Normally when we compare two book spines(template and image) if i get 60% of similarities I can assume its the same book. So I am searching for the optimal way...! Help me out with this...!
While I have limited knowledge in the field of image processing, there is a library which offers such facilities: AForge.NET. That might be good as a starting reference.
EDIT: for an introductory explanation of the theory behind image processing, this may also offer some guidance: http://www.societyofrobots.com/programming_computer_vision_tutorial.shtml
I understand that you are looking for some API or "already-built" image processing library to help you with this, but this answer might help you in a way, or other people who want to pursue something like this.
There are some pretty helpful research papers (including tests from successful implementations) on this Mobile Visual Search page at Stanford. Check out the heading "Book Spine Recognition for Asset Tracking" on that page.
Related
I have thousands of jpegs in a folder structure. These images are a snapshot of my driveway in 2560 x 1440 and are taken and stored every 60 seconds.
I'd like to create a program that can detect, from analyzing an image, whether I or my wife, was home at that particular time or not. I have a red car, she has a bright yellow car. So a simple color threshold should probably suffice. Another clear distinction is that we both have our own spot and never park in the others. Also, other people don't use the driveway (and if they do, I don't mind a false positive). One minor complication is that the camera's switch to black/white during the dark (but that may be when the parking spot rather than the color might come in handy).
So I was hoping I could use ML.Net and train a model with some hand-annotated images where I tag the image with data whether I see my or her car in the driveway. I was thinking of annotating maybe a 100 to a couple of hundred images for day and another set for night and feed all these images to ML.Net to train it and then have analyse a few 100 images where I can manually check the results and correct any mistakes and then create a sort of feedback-loop to train on a few hundred more images.
Once the training is complete I'd like to analyze all images currently stored and each new image as it comes in to generate some data on when I'm (or my wife is) home, away etc.
My problem is (and this is probably going to be the reason for the question being closed as "too broad" or something): I have no clue on how to do this. I have seen awesome tutorials that all make it seem like child's play but when I then try to do this in C# (my language of choice) and look for ML.Net Howto's I can't seem to find anything that helps me in the right direction.
For example: Train a machine learning model with data that's not in a text file. I'm a competent programmer so it's peanuts to create CSV file / database / whatever that has 1.jpg -> rob home, wife not home data. But the "How To" doesn't explain how to feed the image into ML.Net and I haven't been able to find anything that does. Most probable cause is that I'm new to ML(.Net) and probably that I'm too stubborn to give up trying to accomplish this in C# but the information available is, weird as it sounds, overwhelming but also scarce. The information available usually leads me going down some rabbit hole only to find out after way too long that it's not what I want or I can't find anything that hints of me going in the right direction.
So long story short; tl;dr:
How do I feed images into ML.Net, how do I tell ML.Net that my/her car is in the driveway for any given image (training) and how do I get ML.Net to tell me whether it thinks I'm / my wife is home or not for a given image? Or is this not possible (currently)? I'm NOT looking for complete code but for pointers, hints, links, tutorials, examples or whatever may help me in the right direction.
you might find something usefull here Image recognition/classification using Microsoft ML.net 0.2 (Machine learning)
However I would encourage you to consider python as weapon of choice for your task.
Here you would just store the data in different folders according to the label, you #home, your wife #home, both #home, no car in the drive way, other
and you are ready to go.
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
It probably won't take you more than a weekend, and thats inlcuding to learn the bacics of python.
Edit:
I seems as it still does not support to train image classification tasks using ML.Net: "Again, note that this sample only uses/consumes a pre-trained TensorFlow model with ML.NET API. Therefore, it does not train any ML.NET model. Currently, TensorFlow is only supported in ML.NET for scoring/predicting with existing TensorFlow trained models."
There is a thread about it here https://github.com/dotnet/docs/issues/5379,
What you could try is uses: http://www.emgu.com/wiki/index.php/Main_Page in combination with OpenCV, this https://www.geeksforgeeks.org/opencv-python-program-vehicle-detection-video-frame/ is an example in python but it should translate well to c++ or c# using emgu. Once the car is detected check for the position and color. This approach would probably also avoid labeling any data.
Alternatively use a pre trained model h5 file and load into ML.Net then check for the position and mean color to check whos car it is.
I have searched how to use SILVERLIGHT to record video for days but got no luck.
Most articles related to SL only talk about how to record audio or snap a picture rather than recording a video file and save it somewhere.
And I found there is no resources about it on Internet(I am surprised!!!)!
So could you provide me an example code with proper explanation?
I am waiting for it.
PS: I do not want to use Flesh as none iPhone nor iPAD supports it.
Thanks
Fortunately, Mike Taulty's source code can easily be updated to work with Silverlight 5. You can download the fixed source code and try it out (Disclaimer: all code courtesy to Mike Taulty, I merely fixed SL5 compatability and ran a cleanup). Use your favorite diff tool to see the changes I made.
I have tested it, and it generates video files that can be viewed in VLC media player (after selecting build index to fix the corrupted index). As the file format is not 100% correct, the files can not be opened in Windows Media Player etc... but I'm sure that can be fixed.
You should, however, be aware of the fact that Silverlight is not supported on iPhone, iPad, Android, and Windows Phone (Windows Phone apps are made on a special version of Silverlight, but can not run Silverlight applications in the browser).
Unfortunately, after undergoing hundreds of hours researching on the possible solution, I finally found the answer in a book called: Pro Silverlight 4 in C# (Matthew MacDonale, APress).
According what is said on the page 436, although you can do it with Silverlight 4 (you have to write your own file header helper, store the row data as byte arrays and later on you have to convert them into a raw video data and what is worse is the audio and video are separated.), it is just not practical and worthy to do it. Because it requires reams of complex, handwritten code to convert it, and the conversion process is computationally expensive, which makes it extremely difficult to do in real time.
So I guess this would be the conclusion of my question. Now what seems still make sense for me is to find out why Microsoft doesn't support it and what is the new technology which is going to be used/ has been used to replace the SL.
Is it HTML5? But as far as I know, there is only a video tag in HTML which can only be used as a video player, but no tag to support to get the access of a web camera and save the captured video+audio at the same time as one file onto somewhere which could be a local hard drive or a network storage.
I hope all dear you could come up with some thoughtful advice.
I am going to leave this question un-answered for a few days to see whether there could be someone who can come up with some fantastic solution.
Thanks again.
Is there a way to do this in MonoTouch?
http://definelabs.com/blogs/?p=17
I don't understand much of that Objective-C code...
I wrote an article on this: Accessing iPhone Album
I know this is an old post, but there is a demo app that I've created that lets you do the features Nicklas Savonen requested.
What this demo app does is, it will get the list of images from AssetLibrary and will load them in a UItableview, then maintain a selection status, the Tick image is just an overlay image that will be hidden/visible based on selection.
The following link explains the basic steps you need to take, since it would be difficult to understand by the project:
http://helpalittle.wordpress.com/2014/03/28/monotouch-multiple-image-picker/
And you can find the complete solution at the following path: https://onedrive.live.com/redir?resid=697F540B0A2F1506%21107
hope this helps.
I know am not helping much at this point, but you need to learn at least a bit of ObjectiveC to be able to read it. The issues is, that all the samples and plenty of resources for iOS development is in ObjC and converting it to Monotouch is not that complex, in fact all the constructs there have C# equivalent (the blocks in the sample you posted, are in fact anonymous methods).
More to the point, multiselection of the images is done in the iOS SDK 4.x, if I find some spare time this would be a nice little exercise for my blog.
As to what APIs to check for this, these are asset library APIs:
ALAssetsLibrary & ALAsset & ALAssetsGroup
in Monotouch there are classes in (pseudo code):
using MonoTouch.AssetsLibrary;
MonoTouch.AssetsLibrary.ALAsset;
MonoTouch.AssetsLibrary.ALAssetsLibrary;
MonoTouch.AssetsLibrary.ALAssetsGroup;
Hi all
I need your help in studying object recognition in video as this will be my new project in my faculty.
I had a previous study in "Computer Vision" field !
I just need your suggestions as " Good Books, Web resources, Tutorials, others " that will help me in my project.
my project will be in c# or Matlab
thanks
Just a simple suggestion here. Break the problem down into small manageable chunks. Since your studying object recognition you will probably want to get most software off the shelf so you can focus on studying and not debugging.
If you are using mat lab. maybe you should look at this.
http://www.mathworks.com/products/image/
c# I would assume to have some awesome image processing library's now thanks to the xbox kinect.
http://www.codeproject.com/Articles/148251/How-to-Successfully-Install-Kinect-on-Windows-Open.aspx
and just another technology that is good for image processing is labview. If your faculty has licences and people that know it well to help you it may be another option.
http://www.ni.com/labview/whatis/?nipkw=LabVIEW&nicam=OceaniaZA-VI2009&nigrp=labview&nisrc=Google&niurl=&ninet=search
It's not a simple subject so finding specific books/resources are unlikley. As Oli said, your tutor is a specialist, they should be able to point you to guides/reference material
Check out scientific journals in the area of image processing such as Pattern Recognition, Pattern Recognition Letters, Computer Vision and Image Understanding, International Journal of Computer Vision etc, and books such as Computer Vision And Image Processing.
Is it possible to have an application built using the .NET speech recognition classes and pass in a WAV file for it to go through and create a text representation of it. For example, this what I'm trying to do:
We have a QA department at my office and they have to listen to hundreds of calls a day which is quite impossible, and there's not enough people listening to everything to keep up. What I want to do is have the audio file uploaded to our server and have the server parse it and create a transcript of it. It doesn't matter if it's not perfect, but just a base which would be easier to skim through a couple of dozen lines of text than listen to a 2 hour recording.
Based on a saved transcript I can implement full-text search in the database and also run checks against the transcript if someone is saying something that's a misrepresentation.
So, is it possible to create an application using the .NET speech recognition classes and just pass the WAV file to it and it spit out a rough transcript?
I've dug around MSDN on the Speech classes briefly while thinking up the idea, so I don't have that much knowledge if it's possible to be done.
If possible, I would appreciate any examples in C#. Topic 1055347 is similar to the question I'm having, and was provided links, the most specific of which is in C++. I'm not a C++ developer, nor have I ever went to school for programming, I'm all self though C#, so I would like to stay in the language that I know.
Thanks in advance!
This sounds like you've got a call center type of application. Microsoft Speech Server has a SR engine optimized for telephony (8000 Hz sample rate), which will generate much better recognitions than the desktop SR engine. However, the engine isn't really designed for transcription (although it can do it), and the transcriptions definitely need to be reviewed before further processing occurs. Microsoft Exchange Unified Communications uses the SR engine to generate transcripts of voice mail, and while it's better than nothing, it often generates amusing nonsense.
With areas like speech recognition you are likely to either find a stand alone EXE or an API in c/c++.
For the links in the other topic, you can use a tool like P Interop Assistant to generate C# code. The C# code acts like a wrapper around the unmanaged dll, so you can call it from c#.
This is likely to be the best way to get the functionality you are looking for.
Yes.
I did such an application a few years ago on the Tablet PC; you can read about it at http://web.archive.org/web/20060615192119/www.devx.com/TabletPC/Article/30761 (At the time, I spoke of using Interop to access the libraries, but I believe that the programming model has remained the same, just with a managed wrapper.)
At the time, the results were very poor, but maybe for your use-case better than nothing.
How about route the calls to Google Voice? I'm sure there are similar services. I have been amazed at its accuracy so far, plus you can click and listen to it if required. Google Voice will forward voice calls to SMS or email.
UPDATE: On reread, maybe since you are recording calls it won't work as I yous the voice message left.