Trouble parsing NMEA data from Serial Port - c#

I'm retrieving NMEA sentences from a serial GPS. Then string are coming across like I would expect. The problem is that when parsing a sentence like this:
$GPRMC,040302.663,A,3939.7,N,10506.6,W,0.27,358.86,200804,,*1A
I use a simple bit of code to make sure I have the right sentect:
string[] Words = sBuffer.Split(',');
foreach (string item in Words)
{
if (item == "$GPRMC")
{
return "Correct Sentence";
}
else
{
return "Incorrect Sentence
}
}
I added the return in that location for the example. I have printed the split results to a text box and have seen that $GPRMC is indeed coming across in the item variable at some point. If the string is coming across why won't the if statement catch? Is is the $? How can I trouble shoot this?

It has been a while since I read an NMEA GPS...
Don't you need to compare the substring corresponding to the NMEA data type rather than the entire NMEA buffer elements? The .Split method splits sBuffer on all the commas in the NMEA sentence so that you have each individual element. But then you are testing the substring against the first element in a loop that implies that you want to look at every element. Confusing...
So wouldn't your test seem better as:
string[] Words=sBuffer.Split(',');
if(String.Compare(Words[0],"$GPRMC")==0)
{
return "Correct Sentence";
}
else
{
return "Incorrect Sentence
}
Is there a possibility that the NMEA stream is outputting sentences other than the Min Data, GPRMC sentence and you need to reread until you have the correct sentence? Also, are you sure that your GPS has the datatype as $GPRMC rather than GPRMC? I do not think there is supposed to be a $ in the datatype.
ie, in pseudo:
do {
buffer=read_NMEA(); //making sure the entire sentence is read...
array=split(buffer,",");
data_type=buffer[0];
}
while(data_type!="GPRMC" || readcount++<=MAX_NMEA_READS)
To debug your loop, try a console write of the elements:
string[] Words = sBuffer.Split(',');
foreach (string item in Words)
{
Console.WriteLine(item);
}

Are you calculating the checksum, I don't see it.
NMEA Wiki

EDIT: My answer underneath is no improvement, as commentator mtrw stated, the == is overloaded by the string class. I was wrong.
To my mind your if-Statement is faulty. Using the == operator, you are checking if it is the same reference (which certainly will not be the case). To simply compare if the two strings contain the same value, use String.Equals().

Related

Split a string with Backslash

I'm trying to split a long string with a lot of data in it. The data come with a separator character '\r', I wanted to place each row into a List<> and carry out a string.split(). When I use the backslash as separator this doesn't work. I have read the below thread but it doesn't help me cause it still won't separate the data into rows. The idea was to crate a list and place later all info in a datagridview. Therefore I need to split all separate information
Split string with backslash
This is the code I have tried (next to many other things)
private void ExtractMeasurements (string Data)
{
int TotalMeasurements = frmMain.MeasTotal;
List<string> Measurements = new List<string>();
Measurements.AddRange(Data.Split('\\'));
}
This is how the raw string looks like:
And here the result when I try to split the string
Am I making somewhere a silly mistake or is my strategy completely wrong?
Adopted from this answer: How to convert from ascii code to split character C# and #Johnny Mopp's comments on the OP's questions.
This could be an approach that works for you. This uses the actual ASCII code as the value to split on. Being that ASCII code 13 is the Carriage Return.
Measurements.AddRange(Data.Split((char)13));
Ok the issue was solved easily (with something I wanted to do from the start).
private void ExtractMeasurements (string Data)
{
int TotalMeasurements = frmMain.MeasTotal;
List<string> Measurements = new List<string>();
Measurements.AddRange(Data.Split('\r'));
}
This gives me the following result:

Split string returning an extra string at the end of the returned array

I am trying to split a string using ; as a delimiter.
My output is weird, why is there an empty string a the end of the returned array?
string emails = "bitebari#gmail.com;abcd#gmail.com;";
string[] splittedEmails = emails.TrimEnd().Split(';');
foreach (var email in splittedEmails)
{
Console.WriteLine("Value is :" + email);
}
The console output looks like this:
Value is: bitebari#gmail.com
Value is: abcd#gmail.com
Value is:
The string.Split method doesn't remove empty entries by default, anyhow you can tell it to do that, by providing it with the StringSplitOptions. Try to use your method with the StringSplitOptions.RemoveEmptyEntries parameter.
string[] splittedEmails = emails.Split(';', StringSplitOptions.RemoveEmptyEntries);
Actually you should try to pass ; to your TrimEnd method, since it will truncate white spaces otherwise. Therefore your string remains with the ; at the end. This would result to the following:
string[] splittedEmails = emails.TrimEnd(';').Split(';');
Both of the solutions above work, it really comes to preference as the performance difference shouldn't be that high.
Edit
This behavior is considered to be 'standard' at least in C#, let me quote the MSDN for this one.
This behavior makes it easier for formats like comma separated values (CSV) files representing tabular data. Consecutive commas represent a blank column.
You can pass an optional StringSplitOptions.RemoveEmptyEntries parameter to exclude any empty strings in the returned array. For more complicated processing of the returned collection, you can use LINQ to manipulate the result sequence.
Also there isn't just any special case for that.

easiest way to get each word of e-mail (text file) into an array C#

I am trying to build a phishing scanner for a class project and I am stuck on trying to get an e-mail saved in a text file to properly copy into an array for later processing. What I want is for each word to be in it's own array index.
Here is my sample e-mail:
Subject: Insufficient Funds Notice
Date: September 25, 2013
Insufficient Funds Notice
Unfortunately, on 09/25/2013 your available balance in your Wells Fargo account XXXXXX4653 was insufficient to cover one or more of your checks, Debit Card purchases, or other transactions.
An important notice regarding one or more of your payments is now available in your Messages & Alerts inbox.
To read the message, click here, and first confirm your identity.
Please make deposits to cover your payments, fees, and any other withdrawals or transactions you have initiated. If you have already taken care of this, please disregard this notice.
We appreciate your business and thank you for your prompt attention to this matter.
If you have questions after reading the notice in your inbox, please refer to the contact information in the notice. Please do not reply to this automated email.
Sincerely,
Wells Fargo Online Customer Service
wellsfargo.com | Fraud Information Center
4f57e44c-5d00-4673-8eae-9123909604b6
I don't want any of the punctuation all I need is the words and numbers.
Here is the code I have written for it so far.
StreamReader sr1 = new StreamReader(lblDisplaySelectedFilePath.Text);
string line = sr1.ReadToEnd();
words = line.Split(' ');
int wordslowercount = 0;
foreach (string word in words)
{
words[wordslowercount] = word.ToLower();
wordslowercount = wordslowercount + 1;
}
The issue with the above code is that I keep getting words that are either strung together and/or have "\r" or "\n" on them in the array. Here is an example of what is in the array that I don't want.
"notice\r\ndate:" don't want the \r, \n, or the :. Also the two words should be in different indexes.
The regex \W will allow you to split your string and create a list of words. This uses word boundaries, so it will not include punctuation.
Regex.Split(inputString, "\\W").Where(x => !string.IsNullOrWhiteSpace(x));
using System;
using System.Text.RegularExpressions;
public class Example
{
static string CleanInput(string strIn)
{
// Replace invalid characters with empty strings.
try {
return Regex.Replace(strIn, #"[^\w\.#-]", "",
RegexOptions.None, TimeSpan.FromSeconds(1.5));
}
// If we timeout when replacing invalid characters,
// we should return Empty.
catch (RegexMatchTimeoutException) {
return String.Empty;
}
}
}
Using line.Split(null) will split on white-space. From the C# String.Split method documentation:
If the separator parameter is null or contains no characters, white-space characters are assumed to be the delimiters. White-space characters are defined by the Unicode standard and return true if they are passed to the Char.IsWhiteSpace method.

Converting integers to ASCII results in beeping and illegible text?

I've never encountered a problem like this in a console application - I hear some rather frightening beeping, the program (and my computer) freezes momentarily, and then the console just stops working - it doesn't even print an error message.
I have this text file. I read it as follows and then split the numbers into a list:
string path = #"C:\Users\owner\Documents\Quick Access\cipher1.txt";
string data = "";
using (StreamReader sr = new StreamReader(path))
{
data = sr.ReadToEnd();
}
List<int> encryptedNums = new List<int>();
foreach (string s in data.Split(','))
{
encryptedNums.Add(Convert.ToInt32(s));
}
Each number represents an ASCII character. I want to concatenate these numbers into a string:
string encryptedString = "";
//WTF????
foreach (int n in encryptedNums)
{
encryptedString += (char)n;
}
The foreach loop results in some very weird conversions. By printing out the characters one by one and comparing them with an ASCII table, I see that the results are definitely not matching - for example, the number 2 results in a smiley-type figure. One of the conversions results in the beeping noise.
Here's what really stymies me. I have a separate method where I do essentially the same thing - I have a list of integers and I try to convert them to a string:
string s = "";
foreach (int n in decrypted)
{
s += (char)n;
}
But this actually results in a proper string! I don't see where my error is, and why the first conversion fails, while the second conversion (and unless I'm missing something, the code is the same) is fine.
I'd appreciate any help.
The code that ultimately causes is the crash is printing encryptedString.
Here is the full code.
The reason the second works and the first does not is the value of n is different.
Visible letters will be in the range of 32 to 126 (and 9, 10 and 13 for \t, \n, and \r respectively), if you are not getting ints in that range you are doing the "decryption" incorrectly (from your code example, you have not done any decryption at all).
You must do something to the list of ints in the text file to make it legeable text.
Your problem is the bell character or '\7'.
http://en.wikipedia.org/wiki/Bell_character
Printing that character usually results in a beep and doing so multiple times in a short time span causes the application to freeze on most systems.

parsing words in a continuous string

If a have a string with words and no spaces, how should I parse those words given that I have a dictionary/list that contains those words?
For example, if my string is "thisisastringwithwords" how could I use a dictionary to create an output "this is a string with words"?
I hear that using the data structure Tries could help but maybe if someone could help with the pseudo code? For example, I was thinking that maybe you could index the dictionary into a trie structure, then follow each char down the trie; problem is, I'm unfamiliar with how to do this in (pseudo)code.
I'm assuming that you want an efficient solution, not the obvious one where you repeatedly check if your text starts with a dictionary word.
If the dictionary is small enough, I think you could try and modify the standard KMP algorithm. Basically, build a finite-state machine on your dictionary which consumes the text character by character and yields the constructed words.
EDIT: It appeared that I was reinventing tries.
I already did something similar. You cannot use a simple dictionary. The result will be messy. It depends if you only have to do this once or as whole program.
My solution was to:
Connect to a database with working
words from a dictionary list (for
example online dictionary)
Filter long and short words in dictionary and check if you want to trim stuff (for example don't use words with only one character like 'I')
Start with short words and compare your bigString with the database dictionary.
Now you need to create a "table of possibility". Because a lot of words can fit into 100% but are wrong. As longer the word as more sure you are, that this word is the right one.
It is cpu intensive but it can work precise in the result.
So lets say, you are using a small dictionary of 10,000 words and 3,000 of them are with a length of 8 characters, you need to compare your bigString at start with all 3,000 words and only if result was found, it is allowed to proceed to the next word. If you have 200 characters in your bigString you need about (2000chars / 8 average chars) = 250 full loops minimum with comparation.
For me, I also did a small verification of misspelled words into the comparation.
example of procedure (don't copy paste)
Dim bigString As String = "helloworld.thisisastackoverflowtest!"
Dim dictionary As New List(Of String) 'contains the original words. lets make it case insentitive
dictionary.Add("Hello")
dictionary.Add("World")
dictionary.Add("this")
dictionary.Add("is")
dictionary.Add("a")
dictionary.Add("stack")
dictionary.Add("over")
dictionary.Add("flow")
dictionary.Add("stackoverflow")
dictionary.Add("test")
dictionary.Add("!")
For Each word As String In dictionary
If word.Length < 1 Then dictionary.Remove(word) 'remove short words (will not work with for each in real)
word = word.ToLower 'make it case insentitive
Next
Dim ResultComparer As New Dictionary(Of String, Double) 'String is the dictionary word. Double is a value as percent for a own function to weight result
Dim i As Integer = 0 'start at the beginning
Dim Found As Boolean = False
Do
For Each word In dictionary
If bigString.IndexOf(word, i) > 0 Then
ResultComparer.Add(word, MyWeightOfWord) 'add the word if found, long words are better and will increase the weight value
Found = True
End If
Next
If Found = True Then
i += ResultComparer(BestWordWithBestWeight).Length
Else
i += 1
End If
Loop
I told you that it seems like an impossible task. But you can have a look at this related SO question - it may help you.
If you are sure you have all the words of the phrase in the dictionary, you can use that algo:
String phrase = "thisisastringwithwords";
String fullPhrase = "";
Set<String> myDictionary;
do {
foreach(item in myDictionary){
if(phrase.startsWith(item){
fullPhrase += item + " ";
phrase.remove(item);
break;
}
}
} while(phrase.length != 0);
There are so many complications, like, some items starting equally, so the code will be changed to use some tree search, BST or so.
This is the exact problem one has when trying to programmatically parse languages like Chinese where there are no spaces between words. One method that works with those languages is to start by splitting text on punctuation. This gives you phrases. Next you iterate over the phrases and try to break them into words starting with the length of the longest word in your dictionary. Let's say that length is 13 characters. Take the first 13 characters from the phrase and see if it is in your dictionary. If so, take it as a correct word for now, move forward in the phrase and repeat. Otherwise, shorten your substring to 12 characters, then 11 characters, etc.
This works extremely well, but not perfectly because we've accidentally put in a bias towards words that come first. One way to remove this bias and double check your result is to repeat the process starting at the end of the phrase. If you get the same word breaks you can probably call it good. If not, you have an overlapping word segment. For example, when you parse your sample phrase starting at the end you might get (backwards for emphasis)
words with string a Isis th
At first, the word Isis (Egyptian Goddess) appears to be the correct word. When you find that "th" is not in your dictionary, however, you know there is a word segmentation problem nearby. Resolve this by going with the forward segmentation result "this is" for the non-aligned sequence "thisis" since both words are in the dictionary.
A less common variant of this problem is when adjacent words share a sequence which could go either way. If you had a sequence like "archand" (to make something up), should it be "arc hand" or "arch and"? The way to determine is to apply a grammar checker to the results. This should be done to the whole text anyway.
Ok, I will make a hand wavy attempt at this. The perfect(ish) data structure for your problem is (as you've said a trie) made up of the words in the dictionary. A trie is best visualised as a DFA, a nice state machine where you go from one state to the next on every new character. This is really easy to do in code, a Java(ish) style class for this would be :
Class State
{
String matchedWord;
Map<char,State> mapChildren;
}
From hereon, building the trie is easy. Its like having a rooted tree structure with each node having multiple children. Each child is visited on one character transition. The use of a HashMap kind of structure trims down time to look up character to next State mappings. Alternately if all you have are 26 characters for the alphabet, a fixed size array of 26 would do the trick as well.
Now, assuming all of that made sense, you have a trie, your problem still isn't fully solved. This is where you start doing things like regular expressions engines do, walk down the trie, keep track of states which match to a whole word in the dictionary (thats what I had the matchedWord for in the State structure), use some backtracking logic to jump to a previous match state if the current trail hits a dead end. I know its general but given the trie structure, the rest is fairly straightforward.
If you have dictionary of words and need a quick implmentation this can be solved efficiently with dynamic programming in O(n^2) time, assuming the dictionary lookups are O(1). Below is some C# code, the substring extraction could and dictionary lookup could be improved.
public static String[] StringToWords(String str, HashSet<string> words)
{
//Index of char - length of last valid word
int[] bps = new int[str.Length + 1];
for (int i = 0; i < bps.Length; i++)
bps[i] = -1;
for (int i = 0; i < str.Length; i++)
{
for (int j = i + 1; j <= str.Length ; j++)
{
if (bps[j] == -1)
{
//Destination cell doesn't have valid backpointer yet
//Try with the current substring
String s = str.Substring(i, j - i);
if (words.Contains(s))
bps[j] = i;
}
}
}
//Backtrack to recovery sequence and then reverse
List<String> seg = new List<string>();
for (int bp = str.Length; bps[bp] != -1 ;bp = bps[bp])
seg.Add(str.Substring(bps[bp], bp - bps[bp]));
seg.Reverse();
return seg.ToArray();
}
Building a hastset with the word list from /usr/share/dict/words and testing with
foreach (var s in StringSplitter.StringToWords("thisisastringwithwords", dict))
Console.WriteLine(s);
I get the output "t hi sis a string with words". Because as others have pointed out this algorithm will return a valid segmentation (if one exists), however this may not be the segmentation you expect. The presence of short words is reducing the segmentation quality, you might be able to add heuristic to favour longer words if two valid sub-segmentation enter an element.
There are more sophisticated methods that finite state machines and language models that can generate multiple segmentations and apply probabilistic ranking.

Categories