How do I capture named groups in C# .NET regex? [closed] - c#

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
I'm trying to use named groups to parse a string.
An example input is:
exitcode: 0; session id is RDP-Tcp#2
and my attempted regex is:
("(exitCode)+(\s)*:(\s)*(?<exitCode>[^;]+)(\s)*;(\s)*(session id is)(\s)*(?<sessionID>[^;]*)(\s)*");
Where is my syntax wrong?
Thanks

In your example:
exitcode: 0; session id is RDP-Tcp#2
It does not end with a semi-colon, but it seems your regular expression expects a semi-colon to mark the end of sessionID:
(?<sessionID>[^;]*)
I notice that immediately following both your named groups, you have optional whitespace matches -- perhaps it would help to add whitespace into the character classes, like this:
(?<exitCode>[^;\s]+)
(?<sessionID>[^;\s]*)
Even better, split the string on the semi-colon first, and then perhaps you don't even need a regular expression. You'd have these two substrings after you split on the semi-colon, and the exitcode and sessionID happen to be on the ends of the strings, making it easy to parse them any number of ways:
exitcode: 0
session id is RDP-Tcp#2

Richard's answer really covers it already - either remove or make optional the semicolon at the end and it should work, and definitely consider putting whitespace in the negated classes or just splitting on semi-colon, but a little extra food for thought. :)
Don't bother with \s where it's not necessary - looks like your output is some form of log or something, so it should be more predictable, and if so something simpler can do:
exitcode: (?<exitCode>\d+);\s+session id is\s+(?<sessionID>[^;\s]*);?
For the splitting on semi-colon, you'll get an array of two objects - here's some pseudo-code, assuming exitcode is numeric and sessionid doesn't have spaces in:
splitresult = input.split('\s*;\s*')
exitCode = splitresult[0].match('\d+')
sessionId = splitresult[1].match('\S*$')
Depending on who will be maintaining the code, this might be considered more readable than the above expression.

Related

Meaning of -1 in Programming [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I hate to ask this question on here, but I have searched both SO and Google to no success. I have seen in many places statements such as while(var != -1) and other statements, often loops, containing some sort of reference to -1. Is there a certain meaning to the use of -1, or is it just used as giving an integer representation of a boolean, or something like that? I have mainly seen this in C# programming if that is any help.
in C# -1 is just negative one. They're comparing a number against a number, seeing if it is indeed equal to negative one.
It's not uncommon to have an integer field that should only have positive values (for example, when representing an index in a list) and in such cases -1 is sometimes used to represent "not a valid value", for example, there is no item, and hence no index. They use -1 because an int is not nullable; they cannot assign null.
In theory this is probably a bad practice; it's using a "magic value" to mean something more than it really should. Ideally if "there is not valid" is a valid thing for the variable to represent it should be a nullable integer (int? or Nullable<int>) but this is an old convention (carried over from other languages without a feature for nullable ints) so it's hard to eliminate entirely.
Nothing special about it. It's just that in most frameworks and libraries, functions or methods that return an index of an element in a collection will return -1 when whatever you're looking for isn't in the collection.
For example, the index of the character b in the string foo would be -1 in JavaScript, .NET and, as far as I remember, Java as well.
So many devs have burned a rom in their minds saying that -1 is the index for not found items. Now you know why.
If you know that an int should always contain positive value (for instance an item count or an index in a list, -1 can be a kind of "reserved value", so you would for instance assign the count to -1 and as long as it's -1, you know no real value has been put in there, a bit like a "null"
other than that I don't think there's any special meaning to -1

can storing data in a database sometimes lead to corrupted data? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have a field that's stored in the database as a string. It's actually a comma-separated string of numbers that I convert to a list of longs. The lines of code that do the conversion look somewhat like this:
TheListOfLongs = (from string s in StringFromDB.Split(',')
select Convert.ToInt64(s)).ToList<long>();
The code that creates the database storage string looks like this:
return String.Join(",", TheListOfLongs.Select(x=> x.ToString()).ToArray());
This works fine but as you can see, if for some reason there's a problem with the string, the code in the first line of code breaks on Convert.ToInt64(s).
Now I know I can wrap all this around a try statement but my question is this: can storing and retrieving a string in the database corrupt the string (in which case I definitely need the try statement) or is this a one a trillion odd type of event?
I wouldn't worry about corrupt data per se. However, you definitely need to handle the more general case where you can't parse what should be numeric data.
Even in the most controlled situations it is good programming practice to provide conditions for when you can't process data as you're expecting to be able to. What that means to your application is something you'll need to decide. Wrapping the statement with a try..catch will prevent your application from choking, but may not be appropriate if the parsed list is critical later on.
Selecting from the DB shouldn't corrupt your string.
If the connection is dropped mid transfer or something like that then an exception is thrown.

Regular expression for ensuring domain name is English characters only [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
What's a good, clear regex for matching a domain name that must consist of:
Only English alpha characters, plus numbers
Including spaces or other separator characters that are valid, and reliably handled within a domain name
To clarify, this is for the purposes of validating a domain name. Whilst there are moves in the internet community to support internationalisation of domain names, I've done a fair bit of research into this and to keep my explanation fairly simple, only domain names that include characters that are part of a modern UK English character set (including numbers) are reliably handled by the Domain Name System (DNS). I'm not indicating a desire to prohibit internationalisation - I've done a lot of work during my career doing the opposite!
To answer this, what I was looking for is something like this (tested and works). Sorry the original question wasn't explicit enough about what I was trying to do, however I've upvoted the suggestions that have helped me provide this answer to the commmunity:
^[\w- .]*$
'\w' = shorthand for [a-zA-Z0-9_]
'- .' = allow '-', ' ', '.'
asterisk = any of the previous characters zero or more times
You can use this one:
(?i)[a-z0-9\p{Z}]
where \p{Z} is "All separators" class and i ignore-case option.
You may use [a-zA-Z\d\s\p{P}]+ as the most simple solution. Or go with non-unicode solution >>
POSIX defines character classes [:...:] , but not every regex engine support them.
But alternative sets can be used then...
[:alnum:] [A-Za-z0-9] Alphanumeric characters
[:space:] [ \t\r\n\v\f] Whitespace characters
[:punct:] [\]\[!"#$%&'()*+,./:;<=>?#\^_`{|}~-] Punctuation characters
So putting them together you will get
^[A-Za-z0-9 \t\r\n\v\f\]\[!"#$%&'()*+,./:;<=>?#\^_`{|}~-]+$
This way you see what you going to match and what not. Please note that some characters are escaped by \ as without escaping they would have different meaning.

What algorithm can break text up into its component words? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I was pleasantly surprised to find how easy it is to use iTextSharp to extract the text from a pdf file. By following this article, I was able to get a pdf file converted to text with this simple code:
string pdfFilename = dlg.FileName;
// Show just the file name, without the path
string pdfFileNameOnly = System.IO.Path.GetFileName(pdfFilename);
lblFunnyMammalsFile.Content = pdfFileNameOnly;
string textFilename = String.Format(#"C:\Scrooge\McDuckbilledPlatypus\{0}.txt", pdfFileNameOnly);
PDFParser pdfParser = new PDFParser();
if (!pdfParser.ExtractText(pdfFilename, textFilename))
{
MessageBox.Show("there was a boo-boo");
}
The problem is that the text file generated contains text like this (i.e. it has no spaces):
IwaspleasantlysurprisedtofindhoweasyitistouseiTextSharptoextractthetextfromatextfile.
Is there an algorithm "out there" that will take text like that and make a best guess as to where the word breaks (AKA "spaces") should go?
Though I agree with Gavin that there's an easy way to solve this problem in this case but the problem itself is an interesting one.
This would require a heuristic algorithm to solve. I will just explain in a bit on why I think so. But first, I'll explain my algorithm.
Store all the dictionary words in a Trie. Now take a sentence, and look up in the trie to get to a word. The trie tracks the end of the word. Once you find a word, add a space to it in your sentence. This will work for your sentence. But consider these two examples:
He gave me this book
He told me a parable
For the first example, the above algorithm works fine but for the second example, the algorithm outputs:
He told me a par able
In order to avoid this, we will need to consider a longest match but if we do that then the output for the first example becomes:
He gave met his book.
So we are stuck and hence add heuristics to the algorithm that will be able to judge that grammatically He gave met his book doesn't make sense.

Phone Number Validation in Multiple Countries [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I am having trouble find regex validation patterns for phone numbers in different countries and have little time to try and write my own and was hoping a regex guru would be able to help.
I've checked the usual sources like regexlib already, so if anyone can help i'd be grateful with any of them
I need a separate phone number validation expression for each of the following:
Germany
US
Australia
New Zealand
Canada
Asia
France
The format is here.
Writing the regex is not trivial, but if you specify the rules, would not be difficult.
Instead of making an elaborate regular expression to match how you think your visitor will input their phone number, try a different approach. Take the phone number input and strip out the symbols. Then the regular expression can be simple and just check for 10 numeric digits (US number, for example). Then if you need to save the phone number in a consistent format, build that format off of the 10 digits.
This example validates U.S. phone numbers by looking for 10 numeric digits.
protected bool IsValidPhone(string strPhoneInput)
{
// Remove symbols (dash, space and parentheses, etc.)
string strPhone = Regex.Replace(strPhoneInput, #”[- ()\*\!]“, String.Empty);
// Check for exactly 10 numbers left over
Regex regTenDigits = new Regex(#”^\d{10}$”);
Match matTenDigits = regTenDigits.Match(strPhone);
return matTenDigits.Success;
}
Phone number is a number, what you want to validate there ?
Here you can see how different numbers look like.
And there is no such country like Asia, this is a mainland with several countries.
It's close to impossible to get a single regular expression that will cover all countries.
I'd go with [0-9+][0-9() ]* -- this simply allows any digit to start (or the "+" character), then any combination of digits or parentheses or spaces.
In general validation any further is not really going to be of much use. If the user of the page wants to be contacted by phone, they'll enter a valid phone number -- if not, then they won't.
A better way to enforce a correct phone number and eliminate most simple miskeying is to require the number to be entered twice -- then the user is likely to at least check it!

Categories