How to control third-party text-to-speech voices using SAPI 5? - c#

Windows includes the SAPI 5 API that lets you control TTS voices. However I'm using Acapela Peter and it does not show up in the window TTS dialog. So I cannot use this voice with the typical .NET APIs (for example this codeproject app). However the voice shipped with this text file : VoiceDescriptions.txt which seems like variables that I can feed into the SAPI engine to help it detect this voice. So my question is : How do I use this voice metadata to generate speech with SAPI? I have all of the referenced files installed with the voice. I found the SpVoice Win API but it does not mention any way of loading metadata from text.
[LANG,British]
#=eng.tml
PHOTREE=eng.trx
PROSO=eng.oso
F0=eng.f0r
DICTIONARY=eng.bab.dca
LDI=eng.bab.ldi
BNF=eng.bnx
BNFNOTAG=eng.notag.bnx
POST=eng.pst
GRI=eng.gri
GRO=eng.gro
SPD=180
Language=British
Info=eng.nfo
[VOICE,Peter22k,British,British]
Base=Peter22k.nuul
Coeff=Peter22k.coef
Database=Peter22k.vco
Info=Peter22k.nfo
Pitch=110
Speed=100
Freq=22050

It looks like Acapela has a separate product that adds a SAPI interface layer.
If you want to roll it yourself, you could write a SAPI engine interface to the Acapela TTS engine, but that's a significant undertaking (probably 2-3 dev months to create).

Related

Microsoft Speech Platform - how to update rules at runtime

I am using the System.Speech to build a C# application with voice recognition capabilities.
I read this post http://msdn.microsoft.com/en-us/library/jj127913.aspx that mentions how to update rules dynamically at runtime.
I wonder how I can do the same trick with the C# System.Speech API.
Do you have any idea?
Thank you
System.Speech is a bit different from the SAPI described under this link, however, it's even easier to construct grammars in runtime, you can use GrammarBuilder class for that. You can add any structure of choices and rules to construct the language you need to recognize.
Once you updated the grammar you can load the grammar into recognizer recognizer with LoadGrammar

I need to find a good BSD Licensed C# SIP Softphone

I currently have written an API to a SIP phone system, and would like to integrate a full on SIP softphone into what I already have.
I'm looking to integrate an open source softphone that:
includes a full featured SIP stack
is written in C#, or easily integrable into a C# application
is BSD or similarly licensed
exposes basic features (dialing, transferring, holding, etc) in a fairly high level way (i.e. it would be easy to just write a UI for it and viola I have a custom softphone)
My goal is to make a proof of concept softphone quickly to be able to demo. I'd take a completely built softphone that was BSD if I could just rewrite the C# front end.
Thanks and I look forward to the invariably useful feedback.
As far as I know there isn't an open source C# softphone out there. My own SIP stack is C# and open source but it's used for a SIP application server and is missing chunks of functionality needed for a softphone such as an RTP implementation, codecs, audio device interop etc.
The closest thing I know of that may suit your needs is sipek voip (I'm pretty sure that used to be called pjsip.net) which is a C# wrapper on the pjsip open source SIP and media libraries which are themselves written in C and licensed under GPL; so pjsip doesn't meet your licensing requirement even if you were prepared to use the wrapper library.
The most efficient path for you may be to look around the existing softphones out there and find one that offers skinning services. IN this question which is similar to yours it sounds like the developer is using zoiper. I know counterpath also offer skinning but it's not cheap. Of course if you've got a few months of developer resources sitting idle I'm sure you could build on my or soeon
We are using the mizu webphone. It is not written in C# but it says that it is cross platform so i think that you should be able to use it with C# as well (we are using it from ASP .NET)

Face detection for C# in ASP.NET

I'm looking for a specific form of facial recognition. I want to detect where all the faces are located (and that's all) on an image of students in a class-picture.
So in other words, I'm not trying to compare two faces and see if they match either.
How can I do this in C#? I can't seem to find any open-source projects on NuGet regarding this, and I've looked on CodePlex too.
My personal preference for any Computer Vision related needs is to use http://opencv.willowgarage.com/wiki/ , however, it isn't natively made for C#.
However, after a quick Google search, I found http://www.emgu.com/wiki/index.php/Main_Page which says "Emgu CV is a cross platform .Net wrapper to the Intel OpenCV image processing library. Allowing OpenCV functions to be called from .NET compatible languages such as C#, VB, VC++, IronPython etc. The wrapper can be compiled in Mono and run on Linux / Mac OS X."
Now that Face has retired since it was purchased by Facebook, I use Sky Biometry, which has a C# .NET API and is free.
It's cloud based and obviously requires an Internet connection, but who cares.
I recommend checking FaceRecognition.Net (https://github.com/takuya-takeuchi/FaceRecognitionDotNet) that is base on Face Recognition (https://github.com/ageitgey/face_recognition) that is implemented in Python. Both are Open Source with MIT license.
Another option is Cognitive Services – Face that is from Microsoft and you can use it from Azure directly or from a local Docker. Here you can find more about it: https://learn.microsoft.com/en-us/azure/cognitive-services/face/

Testing WIA without having a scanner/camera device

I wrote a simple scanning code using WIA. I don't have a scanner device so I can't test it. Can I simulate a WIA device to testing it ?
This does definitely what you want: https://github.com/twain/wia-on-twain
I simulates a scanner and publishes a TWAIN and a WIA interface. Also the scanning of a graphical page is simulated, so you can try out different resolutions and colour schemes.
Not sure, but maybe http://scanworkssoftware.com/twainimporter.aspx will help you
or, Go to http://twain.org and under the "Fast Find" section click the last link titled "Sample Data Source & Application". This will install TWAIN 2.0 and a sample source named "TWAIN2 FreeImage Software Scanner" which has some basic scanning features. It does not have a driver interface but will let you preform scans and Get/Set some general properties.
With enough effort and the WIA SDK (and probably the Windows DDK as well) you probably can. But it will be a large amount of effort, especially compared to the price tag of a cheap scanner.
I'm assuming your time is worth something. If this is a hobby project, then compare the price of a cheap scanner to the time saved that can be spent working on the fun parts of the project. If this is a work project, then the time saved is more valuable to your customer than to you, but there should still be business case for buying hardware that will save more time than it cost.
I know this question is very old, but I'll post this as a reference.
Since Windows 10 Microsoft has made a GitHub repository with sample drivers, including the WIA ones:
https://github.com/Microsoft/Windows-driver-samples/tree/master/wia
I wasn't able to test them yet, but they should create a test device.
Well, try if this virtual webcam supports the WIA interface: http://www.soundmorning.com/
If so, you are all set and ready to go.
You can also search for "Fake webcam", there are many versions.
One thing to be concerned about is that all WIA drivers are not created equally. We recently had trouble using some Brother WIA drivers that were supposedly certified. The driver would not allow access to the feeder tray. We ended up having to write TWAIN integration also.
(1)
http://graphics.kodak.com/docimaging/US/en/Support_Center/Document_Scanners/Desktop/i65_Scanner/Support/Drivers_And_Downloads/i55_and_i65_Scanner_Driver/index.htm
InstallSoftware__v1.7.exe
(2)
http://sourceforge.net/projects/twain-samples/files/TWAIN%202%20Sample%20Application/
WIAonTWAIN_SDK.msi
(3)
For testing only you can also use the free demo version of the commercial file import TWAIN driver XPCTWAIN.
Product info: http://www.jse.de/products.html#xpctwain
Demo download: http://www.jse.de/download/setup_xd.exe
setup_xd.exe
Sounds like an occasion where writing the virtual device that is WIA compat might be the thing that needs to be given back to the community.

Question on Speech Recognition classes in .NET

Is it possible to have an application built using the .NET speech recognition classes and pass in a WAV file for it to go through and create a text representation of it. For example, this what I'm trying to do:
We have a QA department at my office and they have to listen to hundreds of calls a day which is quite impossible, and there's not enough people listening to everything to keep up. What I want to do is have the audio file uploaded to our server and have the server parse it and create a transcript of it. It doesn't matter if it's not perfect, but just a base which would be easier to skim through a couple of dozen lines of text than listen to a 2 hour recording.
Based on a saved transcript I can implement full-text search in the database and also run checks against the transcript if someone is saying something that's a misrepresentation.
So, is it possible to create an application using the .NET speech recognition classes and just pass the WAV file to it and it spit out a rough transcript?
I've dug around MSDN on the Speech classes briefly while thinking up the idea, so I don't have that much knowledge if it's possible to be done.
If possible, I would appreciate any examples in C#. Topic 1055347 is similar to the question I'm having, and was provided links, the most specific of which is in C++. I'm not a C++ developer, nor have I ever went to school for programming, I'm all self though C#, so I would like to stay in the language that I know.
Thanks in advance!
This sounds like you've got a call center type of application. Microsoft Speech Server has a SR engine optimized for telephony (8000 Hz sample rate), which will generate much better recognitions than the desktop SR engine. However, the engine isn't really designed for transcription (although it can do it), and the transcriptions definitely need to be reviewed before further processing occurs. Microsoft Exchange Unified Communications uses the SR engine to generate transcripts of voice mail, and while it's better than nothing, it often generates amusing nonsense.
With areas like speech recognition you are likely to either find a stand alone EXE or an API in c/c++.
For the links in the other topic, you can use a tool like P Interop Assistant to generate C# code. The C# code acts like a wrapper around the unmanaged dll, so you can call it from c#.
This is likely to be the best way to get the functionality you are looking for.
Yes.
I did such an application a few years ago on the Tablet PC; you can read about it at http://web.archive.org/web/20060615192119/www.devx.com/TabletPC/Article/30761 (At the time, I spoke of using Interop to access the libraries, but I believe that the programming model has remained the same, just with a managed wrapper.)
At the time, the results were very poor, but maybe for your use-case better than nothing.
How about route the calls to Google Voice? I'm sure there are similar services. I have been amazed at its accuracy so far, plus you can click and listen to it if required. Google Voice will forward voice calls to SMS or email.
UPDATE: On reread, maybe since you are recording calls it won't work as I yous the voice message left.

Categories