I am currently trying to read from a few images the text and it seems that the google api is skipping some 0's.
Here is the code:
Google.Cloud.Vision.V1.Image image = Google.Cloud.Vision.V1.Image.FromFile(imagepath);
ImageAnnotatorClient client = ImageAnnotatorClient.Create();
IReadOnlyList<EntityAnnotation> response = client.DetectText(image);
string test = string.Empty;
foreach (EntityAnnotation annotation in response)
{
if (annotation.Description != null)
{
Console.WriteLine(annotation.Description);
test += Environment.NewLine + annotation.Description;
}
}
Here is the image(s) it is attempting:Attempt 1Attempt 2Attempt 3
Are there settings I need to change to make it accept 0's?
Also here is the output from
Attempt 1: https://pastebin.com/dNxRt7QK
results above
Attempt 2: https://pastebin.com/XVZzmtTg
results above
Attempt 3: https://pastebin.com/2kQMiC8h
results above
It's really good at reading everything but it really hates reading 0's.
The Deaths specifically in Attempt 2/3.
Edit:
Adding in a few results showing this from the google drag-n-drop testing:
Attempt 1
Attempt 2
In order to get better results, it is recommended not to use lossy formats (example of lossy format: JPEG). Using or reducing file sizes for such lossy formats may result in a degradation of image quality, and hence, Vision API accuracy.
The image’s recommended size is 1024 x 768 for the features TEXT_DETECTION and DOCUMENT_TEXT_DETECTION. As an additional note:
The Vision API requires images to be a sufficient size so that
important features within the request can be easily distinguished.
Sizes smaller or larger than these recommended sizes may work.
However, smaller sizes may result in lower accuracy, while larger
sizes may increase processing time and bandwidth usage without
providing comparable benefits in accuracy. Image size should not
exceed 75M pixels (length x width) for OCR analysis.
The items discussed above can be found in this article.
With the code you are using, you can alternately use the DOCUMENT_TEXT_DETECTION feature and select the ones which gives you better results. I see that you are using the code in this link for TEXT_DETECTION. Try using the code in this link for DOCUMENT_TEXT_DETECTION.
In case that issue still persists after the suggested actions, I recommend that you contact Google Cloud Platform Support or create a public issue via this link.
Related
I'm new to Machine Learning and working on my master thesis using ML.net. I'm trying use glove model to vectorise a CV text, but finding it hard to wrap my head over the process. I have the Pipeline setup as below:
var pipeline = context.Transforms.Text.NormalizeText("Text", null,
keepDiacritics: false, keepNumbers: false, keepPunctuations: false)
.Append(context.Transforms.Text.TokenizeIntoWords("Tokens", "Text"))
.Append(context.Transforms.Text.RemoveDefaultStopWords("WordsWithoutStopWords", "Tokens", Microsoft.ML.Transforms.Text.StopWordsRemovingEstimator.Language.English))
.Append(context.Transforms.Text.ApplyWordEmbedding("Features", "WordsWithoutStopWords",
Microsoft.ML.Transforms.Text.WordEmbeddingEstimator.PretrainedModelKind.GloVe300D));
var embeddingTransformer = pipeline.Fit(emptyData);
var predictionEngine = context.Model.CreatePredictionEngine<Input,Output>(embeddingTransformer);
var data = new Input { Text = TextExtractor.Extract("/attachments/CV6.docx")};
var prediction = predictionEngine.Predict(data);
Console.WriteLine($"Number of features: {prediction.Features.Length}");
Console.WriteLine("Features: ");
foreach(var feature in prediction.Features)
{
Console.Write($"{feature} ");
}
Console.WriteLine(Environment.NewLine);
From what I've studied about vectorization, each word in the document should be converted into vector, but when I'm printing the features, I can see 900 features getting printed. Can someone explain how this works? There are very less examples and tutorials available about ML.net on internet.
The vector of 900 features coming the WordEmbeddingEstimator is the min/max/average of the individual word embeddings in your phrase. Each of the min/max/average are 300 dimensional for the GloVe 300D model, giving 900 total.
The min/max gives the bounding hyper-rectangle for the words in your phrase. The average gives the standard phrase embedding.
See: https://github.com/dotnet/machinelearning/blob/d1bf42551f0f47b220102f02de6b6c702e90b2e1/src/Microsoft.ML.Transforms/Text/WordEmbeddingsExtractor.cs#L748-L752
GloVe is short for Global Vectorization.
GloVe is an unsupervised (no human labeling of the of the training set) learning method. The vectors associated with each word are generally derived from each word's proximity with others in sentences.
Once you have trained your network (presumably on a much larger data set than a single CV/resume) then you can make interesting comparisons between words based on their absolute and relative "positions" in the vector space. A much less computationally expensive way of developing a network to analyze e.g. documents is to download a pre-trained dataset. I'm sure you've found this page (https://nlp.stanford.edu/projects/glove/) which, among other things, will allow you to access pre-trained word embeddings/vectorizations.
Final thoughts: I'm sorry if all of this is redundant information for you, especially if this really turns out to be a ML.net framework syntax question. I don't know exactly what your goal is but
900 dimensions seems like an awful lot for looking at CV's. Maybe this is an ML.net default? I suspect that 300-500 will be more than adequate. See what the pre-trained data sets provide.
If you only intend to train your network from zero on a single CV, then this method is going to be wholly inadequate.
Your best approach is likely to be a sort of transfer learning approach where you obtain a liberally licensed network that has been pre-trained on a massive data set in your language of interest (usually easy for academic work). Then perform additional training using a smaller, targeted group of training-only CV's to add any specialized words to the 'vocabulary' of your network. Then perform your experimentation and analysis on a set of test CV's, which have never been used to train the network.
My goal is to get data (pulse) from the fitness bracelet Torntisc T1 using my application and independently process data from the bracelet.
To implement I use Xamarin and found a Bluetooth LE plugin for Xamarin plugin to connect to the device and receive data from it. However, all the characteristics obtained are called "Unknown characteristic" and in values of 0 bytes. Although it has 5 services, each of which has 3 characteristics. The only name of characteristics in 1 service is other: "Device Name", "Appearance", "Peripheral Preferred Connection Parameters". However, the value (value) is everywhere 0 bytes. How to get characteristics? How to get a pulse?
To the bracelet there is an application H Band 2.0, which shows a fairly large number of settings for the bracelet, the question arises where is all this?
Native app H Band 2.0. Attempt of decompile here. I found the classes responsible for the connection in the following directory: sources\no\nordicsemi\android\dfu. I see what has been done via BluetoothGatt. Unfortunately I am not an expert in java and android, unfamiliar with this library. I didn't find any methods or anything related to the "pulse", but a large number of magic parsing characteristics: parse (characteristic)
foreach (var TestService in Services)
{
var characteristics = await TestService.GetCharacteristicsAsync();
foreach (var Characteristic in characteristics)
{
var properties = Characteristic.Properties;
var name = Characteristic.Name;
var serv = Characteristic.Service;
var value = Characteristic.Value;
var stringValue = value.ToString();
string result = "";
if (value.Length != 0)
result = System.Text.Encoding.UTF8.GetString(value, 0, value.Length - 1);
}
}
To start with you can use the following app to get a better overview of the services and characteristics you are working with, without having to code calls to get the values you need.
Having said that you will need documentation to be able to communicate with the device, what I mean is what data you send, what are acceptable responses how they map to meaningful data etc. The core of BLE is the low energy bit which means exchange as little data as possible ex. mapping integers to enum values which you do not know without the documentation, you can work your way back from decompiled source but it will be orders of magnitude more difficult.
One more thing is that BLE is notoriously unreliable (you will understand if you run into gatt 133 errros on samsungs :), so most implementations also have a sort of added network layer to handle drops and graceful degradation, as well as sending larger peaces of data, this is custom developed per app/device and you also need extensive documentation for this to implement it, which is no trivial matter.
The library you've chosen is quite good and wraps most things you need quite well but it does not handle the instability so you have to take care of that part yourself.
Cheers :)
I want to display DICOM file having photometric interpretation MONOCHROME2.
some of the specifications of image are-
Rows: 1024
Columns: 1024
No of Frames: 622
Bits Allocated: 16
Bits Stored: 10
High Bit: 9
Pixel Representation: 0
Sample per pixel: 1
I am using gdcmRegionReader to extract single frames byte array in the following way.
gdcm.ImageRegionReader _regionReader = new gdcm.ImageRegionReader();
_regionReader.SetRegion(_boxRegion); // _boxRegion is some region
_regionReader.ReadIntoBuffer(Result, (uint)Result.Length);
Marshal.Copy(Result.ToArray(), 0, _imageData.GetScalarPointer(),
Result.ToArray().Length);
_viewer.SetInput(_imageData); // _viewer = vtkImageViewer
But when i display that file it is displaying like this..
but the original image is like this..
So can someone help me on how to load and display MONOCHROME2 dicom images.
Disclaimer: I never used the toolkit in question. I am attempting to answer based on my understanding of DICOM. In my experience about DICOM, syntax was rarely was the problem. Real problem was the concept and terms.
I see two problems in output.
One is about part of the image rendered. Notice that entire data is not rendered in your output. Check the toolkit document to see how to set the dimensions/bounds while rendering image.
Other problem is about output quality. Initially, I suspected the Transfer Syntax might be the issue. I do not think it is but just make sure you are uncompromising the image before rendering. I am not sure how your toolkit handles compression while rendering.
There is other way available to render pixel data in the toolkit.
_ImageViewer.SetRenderWindow(renderWindow);
_ImageViewer.GetRenderer().AddActor2D(sliceStatusActor);
_ImageViewer.GetRenderer().AddActor2D(usageTextActor);
_ImageViewer.SetSlice(_MinSlice);
_ImageViewer.Render();
Above code is copied from "http://www.vtk.org/Wiki/VTK/Examples/CSharp/IO/ReadDICOMSeries". Detailed code is available there.
Following links may also be helpful:
http://vtk.1045678.n5.nabble.com/How-to-map-negative-grayscale-to-color-td5737080.html
https://www.codeproject.com/Articles/31581/Displaying-bit-Images-Using-C
You should really use vtkGDCMImageReader2 instead in your code. vtkGDCMImageReader2 precisely encapsulate gdcm::RegionReader for binding with VTK.
If for some reason you cannot use directly this class, simply copy/paste the C++ code from within the main function, into your C# code.
See:
http://gdcm.sourceforge.net/2.6/html/classvtkGDCMImageReader2.xhtml
http://gdcm.sourceforge.net/2.6/html/classgdcm_1_1ImageRegionReader.xhtml
I've Google'd and read quite a bit on QR codes and the maximum data that can be used based on the various settings, all of it being in tabular format. I can't seem to find anything giving a formula or a proper explanation of how these values are calculated.
What I would like to do is this:
Present the user with a form, allowing them to choose Format, EC & Version.
Then they can type in some data and generate a QR code.
Done deal. That part is easy.
The addition I would like to include is a "remaining character count" so that they (the user) can see how much more data they can type in, as well as what effect the properties have on the storage capacity of the QR code.
Does anyone know where I can find the formula(s)? Or do I need to purchase ISO 18004:2006?
A formula to calculate the amount of data you could put in a QRcode would be quite complex to make, not mentioning it would need some approximations for the calculation to be possible. The formula would have to calculate the amount of modules dedicated to the data in your QRCode based on its version, and then calculate how many codewords (which are sets of 8 modules) will be used for the error correction.
To calculate the amount of modules that will be used for the data, you need to know how many modules will be used for the function patterns. While this is not a problem for the three finder patterns, the timing or the version/format information, there will be a problem with the alignment patterns as their number is dependent on the QRCode's version, meaning you anyway would have to use a table at that point.
For the second part, I have to say I don't know how to calculate the number of error correcting codewords based on the correction capacity. For some reason, there are more error correcting codewords used that there should to match the error correction capacity, as for example a 6-H QRCode can correct up to 32.6% of the data, instead of the 30% set by the H correction level.
In any case, as you can see a formula would be quite complex to implement. Using a table like already suggested is probably the best thing you could do.
I wrote the original AIM specification for QR Code back in the '90s for Denso Corporation, and was also project editor for both editions of the ISO/IEC 18004 standard. It was felt to be much easier for people producing code printing software to use a look-up table rather than calculate capacities from a formula - no easy job as there are several independent variables that have to be taken into account iteratively when parsing the text to be encoded to minimise its length in bits, in order to achieve the smallest symbol. The most crucial factor is the mix of characters in the data, the sequence and lengths of sub-strings of numeric, alphanumeric, Kanji data, with the overhead needed to signal each change of character set, then the required level of error correction. I did produce a guidance section for this which is contained in the ISO standard.
The storage is calculated by the QR mode and the version/type that you are using. More specifically the calculation is based on how 'compressible' the characters are and what algorithm that the qr generator is allowed to use on the content present.
More information can be found http://en.wikipedia.org/wiki/QR_code#Storage
Does anyone know how to (natively) get the max allowed file size for a given drive/folder/directory? As in for Fat16 it is ~2gb, Fat32 it was 4gb as far as I remember and for the newer NTFS versions it is something way beyond that.. let alone Mono and the underlying OSes.
Is there anything I can read out / retrieve that might give me a hint on that? Basically I -know- may app will produce bigger, single files than 2gb and I want to check for that when the user sets the corresponding output path(s)...
Cheers & thanks,
-J
This may not be the ideal solution, but I will suggest the following anyway:
// Returns the maximum file size in bytes on the filesystem type of the specified drive.
long GetMaximumFileSize(string drive)
{
var driveInfo = new System.IO.DriveInfo(drive)
switch(driveInfo.DriveFormat)
{
case "FAT16":
return 1000; // replace with actual limit
case "FAT32":
return 1000; // replace with actual limit
case "NTFS":
return 1000; // replace with actual limit
}
}
// Examples:
var maxFileSize1 = GetMaximumFileSize("C"); // for the C drive
var maxFileSize2 = GetMaximumFileSize(absolutePath.Substring(0, 1)); // for whichever drive the given absolute path refers to
This page on Wikipedia contains a pretty comprehensive list of the maximum file sizes for various filesystems. Depending on the number of filesystems for which you want to check in the GetMaximumFileSize function, you may want to use a Dictionary object or even a simple data file rather than a switch statement.
Now, you may be retrieve the maximum file size directly using WMI or perhaps even the Windows API, but these solutions will of course only be compatible with Windows (i.e. no luck with Mono/Linux). However, I would consider this a reasonably nice purely managed solution, despite the use of a lookup table, and has the bonus of working reliably on all OSs.
Hope that helps.
How about using System.Info.DriveInfo.DriveFormat to retrieve the drive's file system (NTFS, FAT, ect.)? That ought to give you at least some idea of the supported file sizes.