Extract Specific Fields With Keywords

Extract Specific Fields With Keywords - c#

I am trying to match the following string for an interface to a security system:
*3824 04:57:04 24/02/16 ALARM(DC4) Input 1 (SI)Main Door Opened(DC2)
Please note that (DC4) / (SI) / (DC2) are just the Visual representation of the ASCII characters so the input on the serial port would be a single byte, not 4/5 bytes.
The system will be continuously sending messages in a similar format to the above and I will need to check each one and see if it requires further processing.
The word ALARM is my keyword so if a message without ALARM in it comes though then I will be ignoring it (MATCH Failed).
If the word ALARM appears in the message then I need to get the location of the event and pass onto other layers within my application.
Sample 1 *3824 04:57:04 24/02/16 ALARM(DC4) Input 1 (SI)Main Door Opened(DC2)
Sample 2 *3824 04:57:04 24/02/16 ALARM(DC4) Input 2 (SI)Back Door Opened(DC2)
So I need to extract everything between the (SI) and (DC2) ASCII characters as a string for further processing.
So Message 1 would match "Main Door Opened" and Message 2 would match "Back Door Opened".
The other layers in the application will then extract this string from the appropriate Group # field if the match is a success.
Thanks,
Daniel.

Try this:
([A-Z]+)(?:[^\)]+.){2}([^\(]+)
Regex101:
Input:
*3824 04:57:04 24/02/16 ALARM(DC4) Input 1 (SI)Main Door Opened(DC2)
Output:
MATCH 1
1. [24-29] `ALARM`
2. [47-63] `Main Door Opened`

This is an exact match in group 1:
ALARM\(DC4\).*\(SI\)(.*)(?=\(DC2\))

Related

Get correct characters from barcode reader independent on input method

I've a problem with barcode readers: if I have a barcode reader with setted country USA and the input language on my PC (Windows 10) is Italian, when I scan a code with special characters (like \ / : ) I'll see at video other characters (like ù - ç).
If I install and change my input method to USA, it reads characters correctly.
Otherwise, if I change country on barcode reader in Italy and leave input method on PC Italian it reads characters correctly.
I can't install USA input method dynamically on customer's PCs and I can't know how barcode readers will be programmed so there's a simple solution to get the correct characters sequence without change input methods and barcode reader's settings??

CNTK Input Data Structure for example: CSTrainingCPUOnlyExamples

I am using the Example of CNTK: LSTMSequenceClassifier via the Console Application: CSTrainingCPUOnlyExamples, using the default data file: Train.ctf, it looks like this:
The Input Layer is dimension: 2000 ( One Hot Vector ), the Output is: 5 Classes ( Softmax ).
The File is loaded via:
MinibatchSource minibatchSource = MinibatchSource.TextFormatMinibatchSource(Path.Combine(DataFolder, "Train.ctf"), streamConfigurations, MinibatchSource.InfinitelyRepeat, true);
StreamInformation featureStreamInfo = minibatchSource.StreamInfo(featuresName);
StreamInformation labelStreamInfo = minibatchSource.StreamInfo(labelsName);
I would really appreciate how the data file is generated and how 2000 Inputs map to 5 classes Output.
Of course, my goal is to write an application to Format and save Data to a file that can be read as an Input Data File. Of course I would need to understand the Structure to make this work.
Thanks!
I see the Y Dimension, this part makes sense, but am having trouble with the Input Layer.

Edit: #Frank Seide MSFT
I wonder if you can verify and give best practices:
private string Format(int sequenceId, string featureName, string featureShape, string labelName, string featureComment, string labelShape, string labelComment)
{
return $"{sequenceId} |{featureName.Replace(" ","-")} {featureShape} |# {featureComment} |{labelName.Replace(" ","-")} {labelShape} |# {labelComment}\r\n";
}
which might return something like:
0 |x 560:1 |# I am a comment |y 1 0 0 0 0 |# I am a comment
Where:
sequenceId = 0;
featureName = "x";
featureShape = "560:1";
featureComment = "I am a comment";
labelName = "y";
labelShape = "1 0 0 0 0";
labelComment = "I am a comment";
On GPU, Frank did suggest around 20 Sequences for each Minibatch, see: https://www.youtube.com/watch?v=TK671HxrufE #26:25
This is for custom C# Dataset formatting.
End edit...
An accidental discovery and I found an answer with some Documentation:
BrainScript CNTK Text Format Reader using CNTKTextFormatReader
The documtnet goes on to explain:
CNTKTextFormatReader (later simply CTF Reader) is designed to consume input text data formatted according to the specification below. It supports the following main features:
Multiple input streams (inputs) per file
Both sparse and dense inputs
Variable length sequences
CNTK Text Format (CTF)
Each line in the input file contains one sample for one or more inputs. Since (explicitly or implicitly) every line is also attached to a sequence, it defines one or more sequence, input, sample relations. Each input line must be formatted as follows:
[Sequence_Id](Sample or Comment)+
.
where
Sample=|Input_Name (Value )*
Comment=|# some content
Each line starts with a sequence id and contains one or more samples (in other words, each line is an unordered collection of samples).
Sequence id is a number. It can be omitted, in which case the line number will be used as the sequence id.
Each sample is effectively a key-value pair consisting of an input name and the corresponding value vector (mapping to higher dimensions is done as part of the network itself).
Each sample begins with a pipe symbol (|) followed by the input name (no spaces), followed by a whitespace delimiter and then a list of values.
Each value is either a number or an index-prefixed number for sparse inputs.
Both tabs and spaces can be used interchangeably as delimiters.
A comment starts with a pipe immediately followed by a hash symbol: |#, then followed by the actually content (body) of the comment. The body can contain any characters, however a pipe symbol inside the body needs to be escaped by appending the hash symbol to it (see the example below). The body of a comment continues until the end of line or the next un-escaped pipe, whichever comes first.
Handy, and gives an answer.
The input data corresponding to the reader configuration above should look something like this:
|B 100:3 123:4 |C 8 |A 0 1 2 3 4 |# a CTF comment
|# another comment |A 0 1.1 22 0.3 54 |C 123917 |B 1134:1.911 13331:0.014
|C -0.001 |# a comment with an escaped pipe: '|#' |A 3.9 1.11 121.2 99.13 0.04 |B 999:0.001 918918:-9.19
Note the following about the input format:
|Input_Name identifies the beginning of each input sample. This element is mandatory and is followed by the correspondent value vector.
Dense vector is just a list of floating point values; sparse vector is a list of index:value tuples.
Both tabs and spaces are allowed as value delimiters (within input vectors) as well as input delimiters (between inputs).
Each separate line constitutes a "sequence" of length 1 ("Real" variable-length sequences are explained in the extended example below).
Each input identifier can only appear once on a single line (which translates into one sample per input per line requirement).
The order of input samples within a line is NOT important (conceptually, each line is an unordered collection of key-value pairs)
Each well-formed line must end with either a "Line Feed" \n or "Carriage Return, Line Feed" \r\n symbols.
Some awesome content on the Input and Label Data in this Video:
https://youtu.be/hMRrqkl77rI - #30:23
https://youtu.be/Vi05nEzAS8Y - #25:20
Also, helpful but not directly related: Read and feed data to CNTK Trainer

How do you delete text surrounding a string that you want?

I've looked online for this but not been able to find an answer unfortunately (sorry if there is something I have missed).
I have some code which filters out a specific string (which can change depending on what is read from the serial port). I want to be able to delete all of the characters which I am not using.
e.g. the string I want from the text below is "ThisIsTheStringIWant"
efefhokiehfdThisIsTheStringIWantcbunlokew
Now, I already have a function with some code which will identify this and print it to where I want. However, as the comms could be coming in from multiple ports at any frequency, before printing the string to where I want it, I need to have a piece of code which will recognise everything I don't want and delete it from my buffer.
e.g. Using the same random text above, I want to get rid of the two random strings at the ends (which are before and after "ThisIsTheStringIWant" in the middle).
efefhokiehfdThisIsTheStringIWantcbunlokew
I have tried using the highest voted answer from this question, however I can't find a way to delete the unwanted text before my wanted string. Remove characters after specific character in string, then remove substring?
If anyone can help, that would be great!
Thanks!
Edit:
Sorry, I should have probably made my question clearer.
Any possible number of characters could be before and/or after the actual string I want, and as the string I want is coming from a serial port it will be different every time depending on what comms are coming in from the serial port. On my application I have a cell in a DGV called "Extract" and by typing in the first bit of the comms I am expecting (in this case, the extract would be This). But that will be different depending on what I am doing.

Find the position of the string you want, delete from the beginning to the predecessor of that position, then delete everything from the length of your string to the end.
String: efefhokiehfdThisIsTheStringIWantcbunlokew
Step 1 - "ThisIsTheStringIWant" starts at position 13, so delete the first twelve, leaving...
String: ThisIsTheStringIWantcbunlokew
Step 2 - "ThisIsTheStringIWant" is 20 characters long, so delete from character 21 to the length of the string, leaving:
String: ThisIsTheStringIWant

Control characters display format in hyperterminal

I am working with a serial device. On data receive, i am getting heart symbol along with actual data.
Then i decoded the ASCII value and found that it is equivalent to <ETX> (End of text)
Why it is showing heart symbol for ETX??
What would be the display character for STX?? Is there any list available other control characters??

http://en.wikipedia.org/wiki/Code_page_437#Interpretation_of_code_points_1.E2.80.9331_and_127
Why does SO think the above link is 0 characters long?

Extracting data from text using templates

I'm building a web service which receives emails from a number of CRM-systems. Emails typically contain a text status e.g. "Received" or "Completed" as well as a free text comment.
The formats of the incoming email are different, e.g. some systems call the status "Status: ZZZZZ" and some "Action: ZZZZZ". The free text sometimes appear before the status and somethings after. Status codes will be mapped to my systems interpretation and the comment is required too.
Moreover, I'd expect that the the formats change over time so a solution that is configurable, possibly by customers providing their own templates thru a web interface would be ideal.
The service is built using .NET C# MVC 3 but I'd be interested in general strategies as well as any specific libraries/tools/approaches.
I've never quite got my head around RegExp. I'll make a new effort in case it is indeed the way to go. :)

I would go with regex:
First example, if you had only Status: ZZZZZ- like messages:
String status = Regex.Match(#"(?<=Status: ).*");
// Explanation of "(?<=Status: ).*" :
// (?<= Start of the positive look-behind group: it means that the
// following text is required but won't appear in the returned string
// Status: The text defining the email string format
// ) End of the positive look-behind group
// .* Matches any character
Second example if you had only Status: ZZZZZ and Action: ZZZZZ - like messages:
String status = Regex.Match(#"(?<=(Status|Action): ).*");
// We added (Status|Action) that allows the positive look-behind text to be
// either 'Status: ', or 'Action: '
Now if you want to give the possibility to the user to provide its own format, you could come up with something like:
String userEntry = GetUserEntry(); // Get the text submitted by the user
String userFormatText = Regex.Escape(userEntry);
String status = Regex.Match(#"(?<=" + userFormatText + ").*");
That would allow the user to submit its format, like Status:, or Action:, or This is my friggin format, now please read the status -->...
The Regex.Escape(userEntry) part is important to ensure that the user doesn't break your regex by submitting special character like \, ?, *...
To know if the user submits the status value before or after the format text, you have several solutions:
You could ask the user where his status value is, and then build you regex accordingly:
if (statusValueIsAfter) {
// Example: "Status: Closed"
regexPattern = #"(?<=Status: ).*";
} else {
// Example: "Closed:Status"
regexPattern = #".*(?=:Status)"; // We use here a positive look-AHEAD
}
Or you could be smarter and introduce a system of tags for the user entry. For instance, the user submits Status: <value> or <value>=The status and you build the regex by replacing the tags string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.