Folding case to speed up comparisons

Folding case to speed up comparisons - c#

"strasse".Equals("STRAße",StringComparison.InvariantCultureIgnoreCase)
This returns true. Which is correct. Unfortunately, when I store one of these in postgres, it thinks they are not the same when doing a case insensitive match (for example, with ~*). I've also tested with citext.
So one solution would be to pre-fold the case, thus storing strasse for either of these values, in another column. I could then index and search on that for matches.
I've been looking for how to fold case in C# for a while, and haven't been able to find a solution in C#. Obviously that knowledge is there because it can compare these strings properly, I just can't find where to get it from.
One solution would be to spawn a perl process perl -E "binmode STDOUT, ':utf8'; binmode STDIN, ':utf8'; while (<>) { print fc }", set the C# side of the process to utf8 for those pipes as well, and just send the text through perl to fold the case. But there has to be a better way than that.

There is string.Normalize(), which takes a NormalizationForm parameter. Michael Kaplan goes into detail on this. He claims it does a better job than FoldStringW.
It does not, however, normalize the case to upper or lower, it only folds to the canonical form. I would suggest you just apply ToUpper or ToLower afterwards.

Looking through the sources I eventually found that most of this implementation is in a set of classes called CompareInfo.
You can find these at github.com/dotnet/runtime
That led me to this page that clues in to the inner workings for the .net culture stuff. .NET globalization and ICU
It seems that dotnet is actually relying completely on native libraries for everything except ordinal operations.
I would assume by this that the .Net Framework is probably using NLS from Win32. For that there is the FoldStringW method that looks promising.
For ICU there is documentation for Case Mappings and I found the u_strFoldCase method.

Related

How to filter values based on a property of the node in gremlin

So i understand how to filter values on gremlin console but things like filter, gt etc dont work on gremlin.net. continuously get errors.
I would like to know how to use Filter in gremlin.net to filter out nodes or edges. I cant find documentation pertaining to how to do this in C# using the gremlin.net library
I tried writing the code i write on gremlin console but some of those functions were not recognized
I am trying to filter out all those nodes that have the property idnum greater than 5
: g.V().Has("idnum", gt(5));
it keeps saying gt is not found under the current context.

Gremlin is largely the same regardless of the programming language you use. There are typically only minor differences in syntax as it relates to the idioms of the programming language itself (e.g. in Java we typically see the initial letter in method names lower cased whereas in C# they are upper cased). So, the general step documentation, though demonstrated in Groovy/Java style, typically gives you enough information on how steps work for you to then translate to your language of choice. Also in that same documentation, where necessary, there are specific notes on programming language specific differences that may be relevant.
That said, I assume your issue is related to the importing of P.gt() for C#:
using static Gremlin.Net.Process.Traversal.P;
You can read more about other Common Imports in the Reference Documentation here.

Implement Language Auto-Completion based on ANTLR4 Grammar

I am wondering if are there any examples (googling I haven't found any) of TAB auto-complete solutions for Command Line Interface (console), that use ANTLR4 grammars for predicting the next term (like in a REPL model).
I've written a PL/SQL grammar for an open source database, and now I would like to implement a command line interface to the database that provides the user the feature of completing the statements according to the grammar, or eventually discover the proper database object name to use (eg. a table name, a trigger name, the name of a column, etc.).
Thanks for pointing me to the right direction.

Actually it is possible! (Of course, based on the complexity of your grammar.) Problem with auto-completion and ANTLR is that you do not have complete expression and you want to parse it. If you would have complete expression, it wont be any big problem to know what kind of element is at what place and to know what can be used at such a place. But you do not have complete expression and you cannot parse the incomplete one. So what you need to do is to wrap the input into some wrapper/helper that will complete the expression to create a parse-able one. Notice that nothing that is added only to complete the expression is important to you - you will only ask for members up to last really written character.
So:
A) Create the wrapper that will change this (excel formula) '=If(' into '=If()'
B) Parse the wrapped input
C) Realize that you are in the IF function at the first parameter
D) Return all that can go into that place.
It actually works, I have completed intellisense editor for several simple languages. There is much more infrastructure than this, but the basic idea is as I wrote it. Only be careful, writing the wrapper is not easy if not impossible if the grammar is really complex. In that case look at Papa Carlo project. http://lakhin.com/projects/papa-carlo/

As already mentioned auto completion is based on the follow set at a given position, simply because this is what we defined in the grammar to be valid language. But that's only a small part of the task. What you need is context (as Sam Harwell wrote: it's a semantic process, not a syntactic one). And this information is independent of the parser. And since a parser is made to parse valid input (and during auto completion you have most of the time invalid input), it's not the right tool for this task.
Knowing what token can follow at a given position is useful to control the entire process (e.g. you don't want to show suggestions if only a string can appear), but is most of the time not what you actually want to suggest (except for keywords). If an ID is possible at the current position, it doesn't tell you what ID is actually allowed (a variable name? a namespace? etc.). So what you need is essentially 3 things:
A symbol table that provides you with all possible names sorted by scope. Creating this depends heavily on the parsed language. But this is a task where a parser is very helpful. You may want to cache this info as it is time consuming to run this analysis step.
Determine in which scope you are when invoking auto completion. You could use a parser as well here (maybe in conjunction with step 1).
Determine what type of symbol(s) you want to show. Many people think this is where a parser can give you all necessary information (the follow set). But as mentioned above that's not true (keywords aside).
In my blog post Universal Code Completion using ANTLR3 I especially addressed the 3rd step. There I don't use a parser, but simulate one, only that I don't stop when a parser would, but when the caret position is reached (so it is essential that the input must be valid syntax up to that point). After reaching the caret the collection process starts, which not only collects terminal nodes (for keywords) but looks at the rule names to learn what needs to be collected too. Using specific rule names is my way there to put context into the grammar, so when the collection code finds a rule table_ref it knows that it doesn't need to go further down the rule chain (to the ultimate ID token), but instead can use this information to provide a list of tables as suggestion.
With ANTLR4 things might become even simpler. I haven't used it myself yet, but the parser interpreter could be a big help here, as it essentially doing what I do manually in my implementation (with the ANTLR3 backend).

This is probably pretty hard to do.
Fundamentally you want to use some parser to predict "what comes next" to display as auto-completion. This has to at least predict what the FIRST token is at the point where the user's input stops.
For ANTLR, I think this will be very difficult. The reason is that ANTLR generates essentially procedural, recursive descent parsers. So at runtime, when you need to figure out what FIRST tokens are, you have to inspect the procedural source code of the generated parser. That way lies madness.
This blog entry claims to achieve autocompletion by collecting error reports rather than inspecting the parser code. Its sort of an interesting idea, but I do not understand how his method really works, and I cannot see how it would offer all possible FIRST tokens; it might acquire some of them. This SO answer confirms my intuition.
Sam Harwell discusses how he has tackled this; he is one of the ANTLR4 implementers and if anybody can make this work, he can. It wouldn't surprise me if he reached inside ANTLR to extract the information he needs; as an ANTLR implementer he would certainly know where to tap in. You are not likely to be so well positioned. Even so, he doesn't really describe what he did in detail. Good luck replicating. You might ask him what he really did.
What you want is a parsing engine for which that FIRST token information is either directly available (the parser generator could produce it) or computable based on the parser state. This is actually possible to do with bottom up parsers such as LALR(k); you can build an algorithm that walks the state tables and computes this information. (We do this with our DMS Software Reengineering Toolkit for its GLR parser precisely to produce syntax error reports that say "missing token, could be any of these [set]")

Creating mathematical function in runtime from string

I am asking this question, because I didn't find yet any posts that are C# related and there might be some build in methods for that I couldn't find. If there are, please tell me so and I can close this question.
Basically I have the common situation:
User types a function w.r.t. one or two variables into some TextBlock
I take this string analyse it
As a return I would like to have a delegate to a method that will take one or two inputs (the variables) and return the function value according to what the user typed in.
Now, I could probably think (and I would like to do this on my own, because I want to use my brain) of an algorithm of analysing the string step by step to actually find out, what has to be calculated first and in what way. E.g. First scan for parentheses, look for the expression within a group of parantheses and calculate that according to more general functions etc.
But in the end I would like to "create" a method of this analysis to be easily used as a normal delegate with a couple of arguments that will return the correct function value.
Are there any methods included in C# for that already, or would I have to go and program everything by myself?
As a remark: I don't want to use anybody else' library, only .NET libraries are acceptable for me.
Edit: After Matt pointed out expression trees, I found this thread which is a good example to my problem.
Edit2: The example pointed out does only include simple functions and will not be useful if I want to include more complex functions such as trigonometric ones or exponentials.

What you are describing is a parser. There are a number of different ways of implementing them, although generally speaking, for complex grammars, a "parser generator" is often used.
A parser generator will take a description of the grammar and convert it into code that will parse text that conforms to the grammar into some form of internal representation that can be manipulated by the program, e.g. a parse tree.
Since you indicate you want to avoid third-party libraries, I'll assume that the use of a parser generator is similarly excluded, which leaves you with implementing your own parser (which fortunately is quite an interesting exercise).
The Wikipedia page on Recursive descent parsers will be particularly useful. I suggest reading through it and perhaps adapting the example code therein to your particular use case. I have done this myself a number of times for different grammars with this as a starting point, so can attest to its usefulness.
The output from such a parser will be a "parse tree". And you then have a number of possibilities for how you convert this into an executable delegate. One option is to implement an Evaluate() method on your parse tree nodes, which will take a set of variables and return the result of evaluating the user's expression. As others have mentioned, your parse tree could leverage .NET's Expression trees, or you can go down the route of emitting IL directly (permitting you to produce a compiled .NET assembly from the user's expression for later use as required).

You might want to look at expression trees.

Check out NCalc for some examples of how to do this. You don't need to use the library, but reading the source is pretty educational.

I found a very helpful pdf explaining the parsing in C# 2.0. This link leads to a very good tutorial on parsers used in C# and also applies that later on to an arithmetic expression.
As this directly helps and answers to my question, I posted this as an answer, rather than as a comment or edit.

outlaw string in c# using intellesence

I often type string in c# when actually I want to type String.
I know that string is an alias of String and I am really just being pedantic but i wish to outlaw string to force me to write String.
Can this be done in ether visual studio intellesence or in resharper and how?

I've not seen it done before, but you may be able to achieve this with an Intellisense extension. A good place to start would be to look at the source for this extension on CodePlex.
Would be good to hear if you have any success with this.

I have always read in "Best Coding Practices" for C# to prefer string, int, float ,double to String, Int32, Single, Double. I think it is mostly to make C# look less like VB.NET, and more like C, but it works for me.
Also, you can go the other way, and add the following on top of every file
using S = System.String;
..
S msg = #"I don't like string.";
you may laugh at this, but I have found it invaluable when I have two similar source codes with different underlying data types. I usually have using num=System.Single; or using num=System.Double; and the rest of the code is identical, so I can copy and paste from one file to the other and maintain both single precision and double precision library in sync.

I think ReSharper can do this!
Here is an extract from the documentation:
ReSharper 5 provides Structural Search and Replace to find custom code constructs and replace them with other code constructs. What's even more exciting is that it's able to continuously monitor your solution for your search patterns, highlight code that matches them, and provide quick-fixes to replace the code according to your replace patterns. That essentially means that you can extend ReSharper's own 900+ code inspections with your custom inspections. For example, if you're migrating to a newer version of a framework, you can create search patterns to find usages of its older API and replace patterns to introduce an updated API.
cheers,
Chris

Sentence generator using Thesaurus

I am creating an application in .NET.
I got a running application name http://www.spinnerchief.com/. It did what I needed it to do but but I did not get any help from Google.
I need functional results for my application, where users can give one sentence and then the user can get the same sentence, but have it worded differently.
Here is an example of want I want.
Suppose I put a sentence that is "Pankaj is a good man." The output should be similar to the following one:
Pankaj is a great person.
Pankaj is a superb man.
Pankaj is a acceptable guy.
Pankaj is a wonderful dude.
Pankaj is a superb male.
Pankaj is a good human.
Pankaj is a splendid gentleman

To do this correctly for any arbitrary sentence you would need to perform natural language analysis of the source sentence. You may want to look into the SharpNLP library - it's a free library of natural language processing tools for C#/.NET.
If you're looking for a simpler approach, you have to be willing to sacrifice correctness to some degree. For instance, you could create a dictionary of trigger words, which - when they appear in a sentence - are replaced with synonyms from a thesaurus. The problem with this approach is making sure that you replace a word with an equivalent part of speech. In English, it's possible for certain words to be different parts of speech (verb, adjective, adverb, etc) based on their contextual usage in a sentence.
An additional consideration you'll need to address (if you're not using an NLP library) is stemming. In most languages, certain parts of speech are conjugated/modified (verbs in English) based on the subject they apply to (or the object, speaker, or tense of the sentence).
If all you want to do is replace adjectives (as in your example) the approach of using trigger words may work - but it won't be readily extensible. Before you do anything, I would suggest that you clearly defined the requirements and rules for your problem domain ... and use that to decide which route to take.

For this, the best thing for you to use is WordNet and it's hyponym/hypernym relations. There is a WordNet .Net library. For each word you want to alternate, you can either get it's hypernym (i.e. for person, a hypernym means "person is a kind of...") or hyponym ("X is a kind of person"). Then just replace the word you are alternating.
You will want to make sure you have the correct part-of-speech (i.e. noun, adjective, verb...) and there is also the issue of senses, which may introduce some undesired alternations (sense #1 is the most common).

I don't know anything about .Net, but you should look into using a dictionary function (I'm sure there is one, or at least a library that streamlines the process if there isn't).
Then, you'd have to go through the string, and ommit words like "is" or "a". Only taking words you want to have synonyms for.
After this, its pretty simple to have a loop spit out your sentences.
Good luck.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.