I'm in dire need of a replacement for a wrapper being used in a C# application. Basically, what we need to do is attach a webcam feed to one of two picture boxes. This will be used to take still images at the push of a button, which may detach the camera feed and attach the still image to that picture box, then reattach the camera feed later. We had previously found some free code to utilize with a CaptureDevice.cs file and Pinvoke.dll to tie it into avicap32.dll. Unfortunately, this seems to have random, intermittent errors that cannot be reliably reproduced. It's just too flaky. At some random point, one of those picture boxes may go black and won't show the feed until the picture is taken, at which point the proper picture is attached to the picture box. Then, even if there's only one webcam attached, it'll keep prompting for the webcam to be selected, something it wouldn't do otherwise.
Quite frankly, I'm surprised and dismayed that Microsoft hasn't included anything in .NET to cover webcam video feeds. I'm looking for something reliable and relatively simple to implement to replace this buggy webcam system.
It is time to use MediaCapture object. Use this sample for MS Windows 10 https://github.com/Microsoft/Windows-universal-samples/tree/master/Samples/CameraStarterKit/
And read this article as well http://www.codepool.biz/csharp-camera-api-video-frame.html
May I suggest
http://www.emgu.com/wiki/index.php/Main_Page
I have used OpenCV in many C++ libraries and it seems to work very well for webcams from other things I've tried. Emgu is just a C# wrapper for OpenCV.
Here is a sample project to try it out on. It's very basic and simple, but should work right away.
http://dl.dropbox.com/u/18919663/vs%20samples/OpenCVCSharpTest.zip (just uploaded)
Sample:
using Emgu.CV;
using Emgu.CV.Structure;
...
public partial class Form1 : Form
{
public Capture cvWebCam;
public Form1()
{
InitializeComponent();
try
{
cvWebCam = new Capture();
timer1.Start();
}
catch
{
Console.WriteLine("Default camera not found or could not start");
}
}
private void timer1_Tick(object sender, EventArgs e)
{
if (cvWebCam != null)
using (Emgu.CV.Image<Bgr, byte> frame = cvWebCam.QueryFrame())
{
pictureBox1.BackgroundImage = frame.ToBitmap();
}
}
}
Try DirectShow.net- it's a free wrapper library to access DirectShow functionality from .NET:
http://directshownet.sourceforge.net
Its code samples contain a sample app for capturing video from webcams, too:
http://sourceforge.net/projects/directshownet/files/DirectShowSamples/
Related
An alternative title could be: What happened to PIN_CATEGORY_STILL?
I am currently comparing images that were captured using DirectShow and PIN_CATEGORY_STILL with images that were captured using UWP MediaCapture.
On the device I am testing/playing around with DirectShow and MediaCapture, DirectShow detects a PIN_CATEGORY_STILL but I am not able to initialize an instance of MediaCapture with anything other than PhotoCaptureSource.VideoPreview.
MediaCaptureInitializationSettings settings = new()
{
VideoDeviceId = someDeviceId,
PhotoCaptureSource = PhotoCaptureSource.Photo
};
MediaCapture capture = new();
// this throws an exception
// "The capture source does not have an independent photo stream."
await capture.InitializeAsync(settings);
At this point I'm not even sure if PhotoCaptureSource.Photo is meant to be used as an equivalent to PIN_CATEGORY_STILL.
Images captured with PIN_CATEGORY_STILL are way brighter in a dark environment and have a much better quality (in file size and resolution) (which is clear to me, since I am using PhotoCaptureSource.VideoPreview for MediaCapture).
Considering this resource Win32 and COM for UWP apps, it seems like UWP MediaCapture does not use DirectShow underneath but MediaFoundation (which is meant to be a successor for DirectShow).
This article led me to this StackOverflow question Media Foundation is incorrectly marking still image capture stream descriptors as video capture, which basically states that MediaFoundation has no PIN_CATEGORY_STILL but returns 1 FPS as video capability for such devices (or profiles).
Since I am not directly using MediaFoundation nor C++, I tried testing this by querying GetAvailableMediaStreamProperties:
private void Foo()
{
var videoPreviewProperties = GetEncodingProperties(MediaStreamType.VideoRecord);
var photoProperties = GetEncodingProperties(MediaStreamType.Photo);
var videoRecordProperties = GetEncodingProperties(MediaStreamType.VideoPreview);
}
private List<VideoEncodingProperties> GetEncodingProperties(MediaStreamType streamType)
{
// MediaCapture was previously initialized with PhotoCaptureSource.VideoPreview
return MediaCapture.VideoDeviceController
.GetAvailableMediaStreamProperties(streamType)
.OfType<VideoEncodingProperties>()
.ToList();
}
None of these returns a VideoEncodingProperties with only 1 FPS.
To test MediaCapture any further I tried some of the sample applications from here UWP sample. I tried CameraAdvancedCapture, CameraOpenCV and CameraManualControls, but the results were not nearly as good as good old PIN_CATEGORY_STILL.
What happened to PIN_CATEGORY_STILL?
Is there any way to capture images without DirectShow/PIN_CATEGORY_STILL and still keeping this level of quality?
Any enlightenment is much appricated.
I'm working on a small personal application that should read some text (2 sentences at most) from a really simple Android screenshot. The text is always the same size, same font, and in approx. the same location. The background is very plain, usually a few shades of 1 color (think like bright orange fading into a little darker orange). I'm trying to figure out what would be the best way (and most importantly, the fastest way) to do this.
My first attempt involved the IronOcr C# library, and to be fair, it worked quite well! But I've noticed a few issues with it:
It's not 100% accurate
Despite having a community/trial version, it sometimes throws exceptions telling you to get a license
It takes ~400ms to read a ~600x300 pixel image, which in the case of my simple image, I consider to be rather long
As strange as it sounds, I have a feeling that libraries like IronOcr and Tesseract may just be too advanced for my needs. To improve speeds I have even written a piece of code to "treshold" my image first, making it completely black and white.
My current IronOcr settings look like this:
ImageReader = new AdvancedOcr()
{
CleanBackgroundNoise = false,
EnhanceContrast = false,
EnhanceResolution = false,
Strategy = AdvancedOcr.OcrStrategy.Fast,
ColorSpace = AdvancedOcr.OcrColorSpace.GrayScale,
DetectWhiteTextOnDarkBackgrounds = true,
InputImageType = AdvancedOcr.InputTypes.Snippet,
RotateAndStraighten = false,
ReadBarCodes = false,
ColorDepth = 1
};
And I could totally live with the results I've been getting using IronOcr, but the licensing exceptions ruin it. I also don't have $399 USD to spend on a private hobby project that won't even leave my own PC :(
But my main goal with this question is to find a better, faster or more efficient way to do this. It doesn't necessarily have to be an existing library, I'd be more than willing to make my own kind of letter-detection code that would work (only?) for screenshots like mine if someone can point me in the right direction.
I have researched about this topic and the best solution which I could find is Azure cognitive services. You can use Computer vision API to read text from an image. Here is the complete document.
How fast does it have to be?
If you are using C# I recommend the Google Cloud Vision API. You pay per request but the first 1000 per month are free (check pricing here). However, it does require a web request but I find it to be very quick
using Google.Cloud.Vision.V1;
using System;
namespace GoogleCloudSamples
{
public class QuickStart
{
public static void Main(string[] args)
{
// Instantiates a client
var client = ImageAnnotatorClient.Create();
// Load the image file into memory
var image = Image.FromFile("wakeupcat.jpg");
// Performs label detection on the image file
var response = client.DetectText(image);
foreach (var annotation in response)
{
if (annotation.Description != null)
Console.WriteLine(annotation.Description);
}
}
}
}
I find it works well for pictures and scanned documents so it should work perfectly for your situation. The SDK is also available in other languages too like Java, Python, and Node
Can someone please tell me whey I am getting a black screen with no video, only sound?
private void screen1btnPlay_Click(object sender, EventArgs e)
{
ScreenOne playScreen1 = new ScreenOne();
playScreen1.PlayScreenOne();
}
... and the other form is like this:
public partial class ScreenOne : Form
{
public ScreenOne()
{
InitializeComponent();
}
public void PlayScreenOne()
{
axVLCPlugin21.playlist.add("file:///" + #"Filepath", null);
axVLCPlugin21.playlist.play();
}
}
Sound works fine, but no video. All the properties of the VLC are left to default, is there something I need to change when using this plugin across multiple forms? Anyone know what's wrong?
Update:: I rebuilt the program in WPF and I am having the same problem. When I have a button on the second form (same form as player) it works fine, as soon as I call it from the main form, sound only. ugh!
I dont know but i can give some solution suggestions,
Make sure the VLC program is installed as 32-bit. I dont know, I've solved a problem that way.
I think high probabilty your problem is based on about "C:\Program Files (x86)\VideoLAN\VLC\plugins" Check your plugins. maybe your audio_filter, audio_mixer, audio_output plugins are missing.
you can remove Vlc, then download and install last VLC 32 bit.
I think that will solve your problem. Dont forget AxAXVLC works with vlc plugins.
I figured out my problem on my own!
When I was creating this instance,
ScreenOne playScreen1 = new ScreenOne();
I was actually creating a redundant instance of what I was trying to do, I'm not sure if that's the right way to put it but I basically already had an instance of the second form and was making another separate instance of the form that was named differently.
I already had in my code to open the second form
Screen2 Screen2 = new Screen2();
private void openScreen2Button_Click(object sender, EventArgs e)
{
Screen2.Show();
}
Then later was doing this which is WRONG, I was adding playscreen1 when I should still have been using Screen2.
Screen2 playScreen1 = new Screen2();
playScreen1.PlayScreenOne();
So when I wanted to use the method to play the media player on the second form from the first one, I just needed to use the same instance of Screen2 that I had created to open the form to begin with instead of created a new instance for what method I wanted to use.
IDK if my explanation makes sense, or maybe its basics to most people (I'm a noob), but if anyone comes across this problem, message me and I'll try to help.
o7
I've been working with OpenCV before for C++ work and It was working great. Now, I'm developing a C# project and using EMGU CV for gender recognition. I've got problem with predict function. Every time I ran it, program crashed on Predict function, when I erased predict line, it is running. Here's my code:
private void Window_Loaded(object sender, RoutedEventArgs e)
{
FaceRecognizer face = new FisherFaceRecognizer(0, 3500);
face.Load("colorFisherFaceModel.yml");
Image<Bgr, Byte> img1 = new Image<Bgr, Byte>("C:\\Users\\sguthesis\\Pictures\\me.jpg");
cascade = new CascadeClassifier("C:\\Users\\sguthesis\\documents\\visual studio 2013\\Projects\\EmguCV FFR with Image\\EmguCV FFR with Image\\haarcascade_frontalface_alt_tree.xml");
FaceRecognizer.PredictionResult predictedLabel = face.Predict(img1);
}
Also, I want to get an output, 1 or 2. 1 for male and 2 for female. I have trained many data that saved on colorFisherFaceModel.yml. It was run well on OpenCV. But I don't know how to use it in EMGU CV.
I'm also working on EmguCV, so I guess I can point few things here,
So first thing is you load "yml" file, Is this file you saved after recognizer is trained or you obtained from somewhere. Because my understanding is first you have to train your Recognizer, what is the structure of the yml file. Why do you load the cascade, where are you going to use it? (normally this is used to detect the face)
If you're saying that program crashed most probably because there are no items in the trained set. (in this case i guess it would be yml file) or as what I've read you need minimum of two faces in training set in order to use Fisher recognizer.
this happens because the FaceRecognizer wants to be trained before the Predict method can be called.
You can even load an existing xml Training file by the face.Load(yourTrainingFile.xml) method or by face.Train(yourImages.ToArray, imageIds.ToArray()) method.
Place your FaceRecognizer.PredictionResult predictedLabel = face.Predict(img1); code part within a trycatch block the the debugger wont close and you will catch the error message.
PS : if the number of pictures have been changed since the last Training.xml file has been created, then it's recommended to call the face.Train method instead of face.Load method!
What am I doing:
My main intent is to enable user friendly text to speech for personal use on Win 7. Approach should work in Google Chrome, VS and Eclipse.
Code example:
Following code creates global keyboard hook for ctrl + alt + space, called hookEvent. If event fires, it starts/stops speaking clipboard contents ( that can be updated with ctrl + c ).
/// <summary>
/// KeyboardHook from: http://www.liensberger.it/web/blog/?p=207
/// </summary>
private readonly KeyboardHook hook = new KeyboardHook();
private readonly SpeechSynthesizer speaker = //
new SpeechSynthesizer { Rate = 3, Volume = 100 };
private void doSpeaking(string text)
{
// starts / stops speaking, while not blocking UI
if (speaker.State != SynthesizerState.Speaking)
speaker.SpeakAsync(text);
else
speaker.SpeakAsyncCancelAll();
}
private void hookEvent(object sender, KeyPressedEventArgs e)
{
this.doSpeaking(Convert.ToString(Clipboard.GetText()));
}
public Form1()
{
InitializeComponent();
hook.KeyPressed += new EventHandler<KeyPressedEventArgs>(hookEvent);
hook.RegisterHotKey(ModifierKeysx.Control|ModifierKeysx.Alt, Keys.Space);
}
Question:
I would prefer not using the clipboard. Or at least, restoring the value after, something like:
[MethodImpl(MethodImplOptions.Synchronized)]
private string getSelectedTextHACK()
{
object restorePoint = Clipboard.GetData(DataFormats.UnicodeText);
SendKeys.SendWait("^c");
string result = Convert.ToString(Clipboard.GetText());
Clipboard.SetData(DataFormats.UnicodeText, restorePoint);
return result;
}
What are my options?
Edit:
To my surprise, I found that my clipboard reader is the best way to go. I created a notification area app, that responds to left click (speaking clipboard) and right click (menu opens up). In menu the user can chance speed, speak or create a audio file.
MS provide accessibility tools that do cover what you're trying to do. If you take a look at documents about screen scraping. In short, every component is accessible in some manner, if you use some of the windows debugging tools you can get to see the component names/structures within. You can then use that, however, its complicated as most times you would need to be very specific for each application you intend to scrape from.
If you manage to scrape you dont need to use the clipboard, as you can access the text property of the apps direct. Its not something I've had to do, hence, Ive no code to offer off the top of my head, but the term "screen scraping" should point you in the right direction.
If to expand a little on what Bugfinder said, Microsoft provider a UI Automation Framework to solve problems like the one you mentioned:
In particular you can use the TextSelectionChangedEvent of TextPattern:
The problem with this solution is that it only works on supported operating systems and applications - and not all support this.
Your clipboard solution is acceptable for applications that do not provide a good automation interface.
But for many applications the UI Automation Framework will work well and will provide you with a far better solution.