Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Are C# and JavaScript Regular Expressions different?
Is there a list of these differences?
Here's a difference we bumped into that I haven't seen documented anywhere, so I'll publish it, and the solution, here in the hope that it will help someone.
We were testing "some but not all" character classes, such as "A to Z but not Q, V or X", using "[A-Z-[QVX]]" syntax. Don't know where we found it, don't know if it's documented, but it works in .Net.
For example, in Powershell, using the .Net regex class,
[regex]::ismatch("K", "^[A-Z-[QVX]]$")
returns true. Test the same input and pattern in JavaScript and it returns false, but test "K" against "^[A-Z]$" in JavaScript and it returns true.
You can use the more orthodox approach of negative lookahead to express "A to Z but not Q, V or X", eg "^(?![QVX])[A-Z]$", which will work in both Powershell and (modern) JavaScript.
Given Ben Atkin's point above about IE6 and IE7 not supporting lookahead, it may be that the only way to do this in a fool-proof (or IE7-proof) way is to expand the expression out, eg "[A-Z-[QVX]" -> "ABCDEFGHIJKLMNOPRSTUWYZ". Ouch.
First, some resources:
Mozilla Development Center JavaScript Guide: Regular Expressions
.NET Framework Regular Expressions - see the links at the bottom of the page
Here are a few differences:
Lookahead is not supported in IE6 and IE7. (Search for x(?=y) in the MDC guide for for examples.)
JavaScript doesn't support named capture groups. Example: (?<foo>)
The list of metacharacters supported by JavaScript is much shorter.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I plan to write a program or rather function which will be able to analyze a string parameter which in turn will be math expression. Only the 4 basic operations are allowed(addition, subtraction, multiplication and division) and the numbers are all whole numbers from -100 to 100. The result is allowed to be float. I know the registries work in the same way I.e calculate result of two numbers and store it, than calculate result of stored value and the next operant and store. And so forth until there are no operands left. The number of operands will usually be 2 but I will have a need of 3 or even more so yes, more operands is a requirement.
I was wondering how would you structure this in C#? What tools helper functions you would use in this scenario?
Note: I am working on Unity 5.1.4 project and I want to use a math parser in it. Unity is .NET 2.0
Note: This seems most promising: http://mono.1490590.n4.nabble.com/Javascript-eval-function-in-c-td1490783.html
It uses a variant of eval() function.
In .NET there are no some high level helper functions to help you with this. You would have to parse and tokenize the string in your code. There are however third party libraries that do what you need, for instance Expression Compiler, Simple Math Parser, Mathos Parser, and many other. Search for math expression parser.
If you want to make one from scratch you could look the code of existing ones.
Hans Passant mentions a simple solution, maybe just what you need. You get the result of the expression, so if you need just that, and not the actual expression tokens, then .NET got you covered.
This tool finished the job with no adding external references, dlls or what not: http://mono.1490590.n4.nabble.com/Javascript-eval-function-in-c-td1490783.html
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I need to create simple search engine for my application. Let's simplify it to the following: we have some texts (a lot) and i need to search and show relevant results.
I've based on this great article extend some things and it works pretty well for me.
But i have problem with stemming words to terms. For example words "annotation", "annotations" etc. will be stemmed to "annot", but imagine you try search something, and you will see unexpected results:
"anno" - nothing
"annota" - nothing
etc.
Only word "annot" will give relevant result. So, how should i improve my search to give expected results? Because "annot" contains "anno" and "annota" is slightly more than "annot". Using contains all the time obviously isn't the solution
If in first case i can use some Ternary search tree, in second case i don't know what to do.
Any ideas would be very helpful.
UPDATE
oleksii has pointed me to n-grams here, which may works for me, but i don't know how to properly index n-grams.
So the Question:
Which data structure would be the best for my needs
How properly index my n-grams
Stemming perhaps isn't much relevant here. Stemming will convert a plural to a singular form.
Given you have a tokeniser, a stemmer and a cleaner (to remove stop words, perhaps punctuation and numbers, short words etc) what you are looking at is a full-text search. I would advice you to get an off-the-shelf solution (like Elasticsearch, Lucene, Solr), but if you fancy a DIY approach I can suggest the following naive implementation.
Step 1
Create a search-orientated tokeniser. One example would be an n-gram tokeniser. It will take your word and split into the following sequences:
annotation
1 - [a, n, o, t, a, i]
2 - [an, nn, no, ot, ...]
3 - [ann, nno, not, ota, ...]
4 - [anno, nnot, nota, otat, ...]
....
Step 2
Sort n-grams for more efficient look-up
Step 3
Search n-grams for exact match using binary search
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I need to extract the url inside the string.
In my case html text is in the db and when i get that text and need to find all url in the text and insert in to another table, can u give me a way to find the url's in SQL or C#.
This is reqular expression to find urls in text
Regex regx = new Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);
MatchCollection mactches = regx.Matches(txt);
One of the possible ways to do it is by using Regular expressions. First option is to extract HTML from the DB, then use Regular Expression to find the links directly. The second option is to locate link tags first, then extract url from them (again by using Regular expressions).
Here you can find information about how to use Regular Expressions in C#:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
On the other hand, writing the correct Regular Expression may not be so easy (it depends on how complex the URL is), but you should take a look at this question: regular expression for url
Also, here you can find a lot of information about regular expressions in general (keep in mind that there are some applications like RegexBuddy, that can help you a lot when it comes to testing your regular expressions): http://www.regular-expressions.info/
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Is it possible to convert a string expression into a boolean condition?
For example, I get the following string:
var b = "32 < 45 && 32 > 20"
I would like to create a bool expression out of this and invoke it. The string representation is also flexible (to make it more fun), so it allows ||, &&, ().
Have a look at Flee (Fast Lightweight Expression Evaluator) on CodePlex.
I would use Irony, the .NET language kit. You could construct a simple grammar with Irony and then parse the string into executable command. There's a decent example of an arthmetic grammar in this tutorial and in the Expression Grammar Sample, its a pretty common request ;)
I definitely suggest using a proper compiler as opposed to Regex or a roll your own approach - it will be much more extensible if you ever want to add more rules.
If it follows all C# expression rules then compile it as dynamic code as per http://www.west-wind.com/presentations/dynamiccode/dynamiccode.htm
If you're dealing with relatively simple mathematical expressions then a straightforward implementation of the shunting-yard algorithm should do the trick.
Take a look at my library, Proviant. It's a .NET Standard library using the Shunting Yard algorithm to evaluate boolean expressions. You could also implement your own grammar.
I think creating an interpreter for this string would not take too long time.
http://www.industriallogic.com/xp/refactoring/implicitLanguageWithInterpreter.html
here you can find information about design that can be used to create it.
You could take a look at JINT (Javascript Interpreter for .NET) http://jint.codeplex.com/
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
This post was edited and submitted for review last year and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
When users create an account on my site I want to make server validation for emails to not accept every input.
I will send a confirmation, in a way to do a handshake validation.
I am looking for something simple, not the best, but not too simple that doesn't validate anything. I don't know where limitation must be, since any regular expression will not do the correct validation because is not possible to do it with regular expressions.
I'm trying to limit the sintax and visual complexity inherent to regular expressions, because in this case any will be correct.
What regexp can I use to do that?
It's possible to write a regular expression that only accept email addresses that follow the standards. However, there are some email addresses out there that doesn't strictly follow the standards, but still work.
Here are some simple regular expressions for basic validation:
Contains a # character:
#
Contains # and a period somewhere after it:
#.*?\.
Has at least one character before the #, before the period and after it:
.+#.+\..+
Has only one #, at least one character before the #, before the period and after it:
^[^#]+#[^#]+\.[^#]+$
User AmoebaMan17 suggests this modification to eliminate whitespace:
^[^#\s]+#[^#\s]+\.[^#\s]+$
And for accepting only one period [external edit: not recommended, does not match valid email adresses]:
^[^#\s]+#[^#\s\.]+\.[^#\.\s]+$
^\S+#\S+$
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+$
Only 1 #
Several domains and subdomains
I think this little tweak to the expression by AmoebaMan17 should stop the address from starting/ending with a dot and also stop multiple dots next to each other. Trying not to make it complex again whilst eliminating a common issue.
(?!.*\.\.)(^[^\.][^#\s]+#[^#\s]+\.[^#\s\.]+$)
It appears to be working (but I am no RegEx-pert). Fixes my issue with users copy&pasting email addresses from the end of sentences that terminate with a period.
i.e: Here's my new email address tabby#coolforcats.com.
Take your pick.
Here's the one that complies with RFC 2822 Section 3.4.1 ...
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Just in case you are curious. :)