I need to do some Natural Language Processing on various text inputs from user in a C# based desktop application. I am using Antelope for this purpose. The first step is to split the text into sentences. Following the documentation provided by Antelope, I used:
using Proxem.Antelope;
using Proxem.Antelope.Lexicon;
using Proxem.Antelope.Tools;
using Proxem.Antelope.LinkGrammar;
using Proxem.Antelope.Stanford;
using NUnit.Framework;
...
...
...
ISentenceSplitter splitter = new Tools.MIL_HtmlSentenceSplitter();
splitter.Text = text;
foreach (string sentence in splitter.Sentences)
{
// Process sentence…
}
Also, I have added references to these libraries as well. But it gives the error
The type or namespace name 'ISentenceSplitter' could not be found (are you missing a using directive or an assembly reference?) C:\Users\...
and
The type or namespace name 'Tools' could not be found (are you missing a using directive or an assembly reference?) C:\Users\...
I can't seem to figure out the solution. After a search on net I found out that other people are also having this problem but no one could actually found a solution. Can you please help me guys?
Simple answer is to avoid using this library. No offense to the authors, they might have done v.good and hard work but if it can not be utilized after all kinds of possible tries then it is useless. They mention in doc that a function belongs to a particular interface but when you go there, it doesn't exist in ANY of the available interfaces.
For those who are curious: I did contact the authors through their site but didn't get a reply even after 8 days.
There are other alternatives available like OpenNLP (java) or its C# counterpart SharpNLP.
I had the same problem. It stems from the fact that antelope dlls are compatible with
.net framework 2.0 and your application is set to use .net 4.0. (possibly)
I changed my application to .net 2.0 and trouble disappeared. :)
In addition I have used both SharpNLP and Antelope. Antelope has some superior features like
Collapse detection and port to wordnet which I had not seen in SharpNLP.
good luck.
Quite simple answer to this one: ISentenceSplitter is not in the actual release. I have v 0.8.7 which should be the same free 2009 version everyone else has To confirm this, I did a
grep -r ISentenceSplitter .
and nothing came back. Try to grep with another interface that does exist such as ILexicon and you see all the dlls that contain ILexicon.
Note that we are talking about a free version and Proxem, like other reasonable companies, want to market their technology as a profitable enterprise. Therefore you have to be happy with what you have, look into the paid solutions, use something else or write your own library which the community would be very happy to have.
I ran into that same error. Sali Hoo has the actual correct answer. This is not an answer for how to find or use ISentenceSplitter; however, it does show how Proxem's example application splits sentences as part of a larger process.
The example application that ships with the libraries can split sentences, so I figured the code allows for it. After reading a bit of code, I found that their example uses the Proxem.Antelope.Document class for sentence splitting and syntactical parsing. Rather than decompiling the libraries to see how the guts of the Document constructor uses ISentenceSplitter (I'm fairly confident that it does) I just used the class and it actually ended up performing all of the functions I needed, not just sentence splitting. I definitely recommend taking a look at how their example uses it, but here's the essence of my implementation:
public sealed class SyntaxService : ISyntaxService
{
public SyntaxService(IParser parser, ILexicon lexicon)
{
m_parser = parser;
m_lexicon = lexicon;
// These are the setting I needed, YMMV. They seem to add a lot of UI logic to this method :(
m_processingResources = new ProcessingResources(m_lexicon, null, null, m_parser, 5, null, false, null, null, null, null, false);
}
public IEnumerable<Sentence> GetSentences(string s)
{
IDocument document = new Document(s, m_processingResources);
return document.Select(Mappers.SentenceMapper.Map);
}
readonly IParser m_parser;
readonly ILexicon m_lexicon;
readonly IProcessingResources m_processingResources;
}
The Sentence class referenced here is a simplified syntax tree along with the original sentence value. If you're not super interested in this type of dependency injection friendly wrapping and mapping, you can just use the guts of GetSentences(string).
Related
I have a background in C++ and recently I started working in C#.
I have written following pieces of code (in Visual Studio):
var list_Loads = database.GetData<Load>().ToList();
var test_list = list_Loads.Where(o => (o.Name.Substring(0, 3) == "123")).ToList();
When I run the program and I move my mouse over both lists, first I get the count, which is very useful, but when I ask for the entries, this is what I get:
0 : namespace.Load
1 : namespace.Load
2 : namespace.Load
...
Not very useful, as you can imagine :-)
So my question: how can I show the Name attributes of those objects?
I thought: no problem. I have a background in native visualisers, so it should be rather easy to turn this into useful information, but then it comes:
In order to alter the way that those objects are represented, there is the first proposal to add a [DebuggerDisplay] "tag" to the definition of that class in source code.
However, as those classes are part of a framework I'm just referring to, I don't have access to the source code and hence I can't modify this.
Then I found another solution, which comes down to: "Write an entire C# project, debug, test and install it and it might work" (see documentation on "Custom visualisers of data" on the Microsoft website).
I almost choked in my coffee: writing an entire project, just for altering the view of an object??? (While, in C++, you just create a simple .natvis file, mention the classname and some configuration, launch .nvload and that's it.
Does anybody know a simple way to alter the appearance of C# object, without needing to pass through the whole burden of creating an entire C# project?
By the way, when I try to load a natvis file in Visual Studio immediate window, this is what I get:
.nvload "C:\Temp_Folder\test.natvis"
error CS1525: Invalid expression term '.'
What am I doing wrong?
Thanks in advance
OP (my emphasis):
In order to alter the way that those objects are represented, there is the first proposal to add a [DebuggerDisplay] "tag" to the definition of that class in source code.
However, as those classes are part of a framework I'm just referring to, I don't have access to the source code and hence I can't modify this.
Does anybody know a simple way to alter the appearance of C# object, without needing to pass through the whole burden of creating an entire C# project?
If you just want to specify [DebuggerDisplay] on a type, you don't have to have access to the source code. You can make use of [assembly:DebuggerDisplay()] and control how a type appears in the debugger. The only downside is that [assembly:DebuggerDisplay()] naturally only affects the current assembly whose code your mouse is hovering over. If you wish to use the customised display in other assemblies that you own, then you must repeat the [assembly:DebuggerDisplay()] definition.
Here's an easy before-and-after example with DateTime. I picked DateTime because we generally don't have access to the source code and it has some interesting properties:
var items = new List<DateTime>
{
DateTime.Now.AddDays(-2),
DateTime.Now.AddDays(-1),
DateTime.Now
};
...which on my machine defaults to:
Maybe I'm fussy and I just want to see:
Day of the week and
Day of the year
...I can do that via:
using System.Diagnostics;
[assembly: DebuggerDisplay("{DayOfWeek} {DayOfYear}", Target = typeof(DateTime))]
...which results in:
Example:
namespace DebuggerDisplayTests
{
public class DebuggerDisplayTests
{
public DebuggerDisplayTests()
{
var items = new List<DateTime>
{
DateTime.Now.AddDays(-2),
DateTime.Now.AddDays(-1),
DateTime.Now
};
}
}
.
.
.
}
Overrides
[assembly:DebuggerDisplay()] can also be used as a means to override pre-existing [DebuggerDisplay] on a 3-rd party type. Don't like what style they have chosen? Is the type showing far too much information? Change it with [assembly:DebuggerDisplay()].
I am looking for information regarding the naming of a C# class with an identifying prefix or suffix that refers to a toolkit or company the library is from.
For example, suppose the company name is Audio Solutions and they want to create a C# class library for a toolkit named Audio Control Toolkit.
Example using company abbreviation of AS:
namespace AudioSolutions.AudioControlToolkit
public class AS_FileConversions
{
public static AS_MP3toWave() { .... }
}
Example using toolkit abbreviation
namespace AudioSolutions.AudioControlToolkit
public class ACT_FileConversions
{
public static ACT_MP3toWave() { .... }
}
Using the above prefixes is desired to identify what parts of an application are using the toolkit. Is there any published information regarding using the above approach to brand code from a vendor? I don't know if the above naming considerations will cause a problem or are considered bad practice. Thanks in advance.
Update 4/15/2020 12:15pm PST
This topic appears to be very preference driven and no one yet has found any widely published material on this matter. I did find one big name place using an identifier in the prefix of their classes or methods below are some examples:
https://cuda-tutorial.readthedocs.io/en/latest/tutorials/tutorial01/
sample coding objects from the above site page:
cudaMalloc((void**)&d_a, sizeof(float) * N);
cudaMemcpy(d_a, a, sizeof(float) * N, cudaMemcpyHostToDevice);
cudaFree(d_a);
Other than identifying if a certain vendor's sdk functions are being used I could only find one other reason for adding such a prefix. That reason would be to avoid the compiler from complaining of ambiguity between 2 or more function calls with the same name but different name spaces. In this case, it is easily resolved by deciding which vendor's implementation to fully qualify their function names with the namespace as the prefix in the source code.
I would recommend against putting company names/abbreviations in the class name, and instead use the namespace. I don't think I recall ever seeing a company name included directly in class names on any of the frameworks/toolkits I've used.
For example Telerik has a number of highly popular frameworks/toolkits, and those typically contain Telerik in the namespace (e.g. Telerik.Reporting, Telerik.ReportViewer.Mvc), but never in the class names themselves.
Update 4/15/2020 12:15pm PST
This topic appears to be very preference driven and no one yet has found any widely published material on this matter. I did find one big name place using an identifier in the prefix of their classes or methods below are some examples:
https://cuda-tutorial.readthedocs.io/en/latest/tutorials/tutorial01/
sample coding objects from the above site page:
cudaMalloc((void**)&d_a, sizeof(float) * N);
cudaMemcpy(d_a, a, sizeof(float) * N, cudaMemcpyHostToDevice);
cudaFree(d_a);
Other than identifying if a certain vendor's sdk functions are being used I could only find one other reason for adding such a prefix. That reason would be to avoid the compiler from complaining of ambiguity between 2 or more function calls with the same name but different name spaces. In this case, it is easily resolved by deciding which vendor's implementation to fully qualify their function names with the namespace as the prefix in the source code.
We have some guidelines, how we want to use our namespaces and there are also access restrictions on them. Because developers are doing this wrong sometimes, we need to analyze these rules. Currently we are doing this with nDepend, which is working good. But the process that someone have to overwatch this, go to the guy who violated these rules and force him to fix it, is very time consuming. So it would be very nice to get instant notice while developing, or at least after building the current changes. This should be a job for a roslyn analyzer.
I've introduced myself into roslyn the past 3 hours, but I'm a bit overwhelmed with the feature list and how they work. Maybe you can give me a hint, how I could achieve what I want.
We are talking about a solution with >1m lines of code and nearly 35000 types. So peformance does matter a lot.
What I want to do:
get the current class
get the namespace of the current class
get all used types with their full name
If I'm able to do this, the rest would be relatively easy. I've played arround with it and maybe I need the current project of the opened class and the compilation. But opening this is very time consuming and therefore the performance would be very ugly.
A Roslyn analyzer can register a bunch of different code actions, eg. on the "whole file" level, the method, every single syntax node, or symbol. Depending on what you're exactly are trying to analyze, any of those might be applicable for you. Especially, as you indicate, you're concerned about performance. See the AnalysisContext.Register*Action() methods, for possible "hooks" you can add.
To get the things that you want:
1 Get the current class
Basically, with any of those, you should be able to get the current class (if registering syntax node or symbol action), or all declared classes (for example, with registering a compilation action, or syntax tree action). But the most simple option is to register a syntax node analysis for class nodes, you can do that like this:
context.RegisterSyntaxNodeAction(AnalyzeClassNode, SyntaxKind.ClassDeclaration);
Where AnalyzeClassNode is an action to analyze the class declaration. That will receive an additional context (a SyntaxNodeAnalysisContext), which contains the class declaration syntax node.
2 Get the namespace of the current class
For this, you need the semantic model. Let's say you used the RegisterSyntaxNodeAction method, and declared a method AnalyzeClassNode, then in the body, you can do this:
var classNode = context.Node;
var model = context.SemanticModel;
var classSymbol = model.GetDeclaredSymbol(classNode);
And you get the namespace symbol with:
var #namespace = classSymbol.ContainingNamespace;
And .MetadataName will give you the namespace as string.
3 Get all used types with their full name
That's something much more complex, and really depends on what you're trying to achieve here. To really get to something like "all dependent types, or imports". You should traverse the entire class node, get the symbol for every useful node (I have no idea what that would entail), and checking it's namespace, or full metadata name.
Maybe, you can elaborate a little bit more on this, to find out if this is the right approach.
By the way, check out "Learn Roslyn Now", a site with a bunch of tutorials for Roslyn. Specifically, you want to checkout part 3 (for syntax nodes), 7 (for symbols), and 10 (intro to analyzers).
I've been attempting to set up FizzlerEx, found at http://fizzlerex.codeplex.com/. After adding the references to my project, I've attempted to run the example code given on the website- the entirety of my code is listed below.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
using Fizzler.Systems.HtmlAgilityPack;
namespace Fizzler_Test
{
class Program
{
static void Main(string[] args)
{
var web = new HtmlWeb();
var document = web.Load("http://example.com/page.html");
var page = document.DocumentNode;
foreach (var item in page.QuerySelectorAll("div.item"))
{
var title = item.QuerySelector("h3:not(.share)").InnerText;
var date = DateTime.Parse(item.QuerySelector("span:eq(2)").InnerText);
var description = item.QuerySelector("span:has(b)").InnerHtml;
}
}
}
However, this yields build errors, claiming that:
Error 1 'HtmlAgilityPack.HtmlNode' does not contain a definition for 'QuerySelectorAll' and no extension method 'QuerySelectorAll' accepting a first argument of type 'HtmlAgilityPack.HtmlNode' could be found (are you missing a using directive or an assembly reference?)
It would seem that QuerySelectorAll is not actually a part of HtmlNode, but given that this is the official example code taken verbatim from the website, I'd expect the creators understand how their library works. I'm at a loss as to what the actual issue could be.
A related problem seems to have been found here, but no suitable answer was ever found: Fizzler and QuerySelectorAll
It would seem that QuerySelectorAll is not actually a part of HtmlNode, but given that this is the official example code taken
verbatim from the website, I'd expect the creators understand how
their library works. I'm at a loss as to what the actual issue could
be.
You are correct about this part. Though you are not correct about the second part, since the author of HAP isn't the author of FizzlerEx. The problem is elsewhere.
Simply by looking at the error, you get the only clue you need to go to solve this .
Error 1 'HtmlAgilityPack.HtmlNode' does not contain a definition for 'QuerySelectorAll' and no extension method 'QuerySelectorAll' accepting a first argument of type 'HtmlAgilityPack.HtmlNode' could be found (are you missing a using directive or an assembly reference?)
So, what do we get; it tells us that there is no method called QuerySelectorAll in the class HtmlNode in the namespace HtmlAgilityPack. If we take a look at the source code of HAP, you can easily determine that the error message is indeed correct, since there is no method by that name in the class we are looking.
Source code for HtmlAgilityPack.HtmlNode - class
Where is this method that we want to use, but cannot find?
It's here, In the Fizzler.Systems.HtmlAgilityPack.HtmlNodeSelection-class.
After trying a few things, I get the code to work perfectly just as it is. The problem was the extensions in the reference between Fizzler and HAP source code.
If you download Fizzler you get HtmlAgilityPack at the same time. When you add the references in Visual Studio (assuming you use that), only add
Fizzler.Systems.HtmlAgilityPack.dll
HtmlAgilityPack.dll
Clean your solution and rebuild it and it should work!
You should add Fizzler by right clicking on references -> Manage Nuget Package, and search online for it, you will find it as Fizzler for HtmlAgilityPack, and then can download it.
I am looking for the C# equivalent of Spring MVC's url mapping using annotations, i.e in Java I can write:
#Controller
#RequestMapping("/some-friendly-url/")
class MyController
{
#RequestMapping(value = "/{type}/more-seo-stuff/{color}", method = RequestMethod.GET)
public List<SomeDTO> get(#PathVariable String type,
#PathVariable String color,
int perPage) {
...
}
#RequestMapping(method = RequestMethod.POST)
public String post(#RequestBody SomeDTO somethingNew) {
...
}
}
It's actually much more powerful than this simple example as anyone familiar the the concept knows.
I've tried to search on how to achieve the same with either ASP.MVC 3 or with MonoRail and both frameworks seem to be based on RoR's convention-over-configuration "//" philosophy and it would be hard to achieve the above with them and require a lot of bespoke routing entries outside the controller class with only a small subset of the functionality available via attributes. Spring.NET does not seem to address this either stating that ASP.MVC's routing functionality is sufficient.
Is there anything out there in the C# world that provides this type of functionality? I was just about to start looking into writing something of my own to address this, but I was hoping not to have to do that.
Edit: Finally found the "AttributeRouting" project which is available on NuGet as well: https://github.com/mccalltd/AttributeRouting/wiki/1.-Getting-Started. Works perfectly. Doesn't support to full range of features that Spring MVC does, but supports most of it.
Also Akos Lukacs pointed to another good library below by ITCloud. However that one unfortunately is not available on NuGet.
Sure, you can use Spring.NET:
http://www.springframework.net/
I Eventually used https://github.com/mccalltd/AttributeRouting/wiki/1.-Getting-Started. Posting this only now for the sake of keeping the question complete.