I'm new to QuickGraph. I followed the examples on the documentation page to add vertices & edges to my graph.
Now, I want to display my graph on a windows form. I'm using Graphviz for that purpose, which generates a .DOT file as output. I'm following the code sample below for rendering:
IVertexAndEdgeListGraph<TVertex,TEdge> g= ...;
var graphviz = new GraphvizAlgorithm<TVertex,TEdge>(g);
string output = graphviz.Generate(new FileDotEngine(), "graph");
But, my compiler is not detecting FileDotEngine(). Moreover, I don't know what to do after the .DOT file is generated.
You have to provide a FileDotEngine yourself; see for instance this example on Github. A simple FileDotEngine that generates a jpg could be:
public sealed class FileDotEngine : IDotEngine
{
public string Run(GraphvizImageType imageType, string dot, string outputFileName)
{
string output = outputFileName;
File.WriteAllText(output, dot);
// assumes dot.exe is on the path:
var args = string.Format(#"{0} -Tjpg -O", output);
System.Diagnostics.Process.Start("dot.exe", args);
return output;
}
}
Then you could display the generated image in a picture box or similar.
Another approach would be to host a WPF control in your winforms app and then use Graph# to display the graph. I haven't tried this myself, however.
Related
I'm using C# and Xamarin forms to create a phone app that (when a button is pressed) will pull specific html data from a website in and save it into a text file (that the program can read from again later). I started with the tutorial in this video: https://www.youtube.com/watch?v=zvp7wvbyceo if you want to see what I started out with, and here's the code I have so far made using this video https://www.youtube.com/watch?v=wwPx8QJn9Kk, in the the "AboutViewModel.cs" file created in the video:
Image link because this is a new account i guess and i cant embed images or something
Paste of the code itself (but the image gives you a better look at everything):
private Task WebScraper()
{
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("https://www.flightview.com/airport/DAB-Daytona_Beach-FL/");
foreach (var item in doc.DocumentNode.SelectNodes("//td[#class='c1']"))
{
var itemstring = item;
File.WriteAllText("AirportData.txt", itemstring);
}
return Task.CompletedTask;
}
public ICommand OpenWebCommand { get; }
public ICommand WebScraperCommand { get; }
}
}
The only error i'm getting right now is "Cannot convert 'HtmlAgilityPack.HtmlNode' to 'string'" Which i'm working on fixing but I don't think this is the best solution so anything you have is useful. Thanks :)
HtmlNode is an object, not a simple string. You probably want to use the OuterHtml property, but consult the docs to see if that is the right fit for your use case
string output = string.Empty;
foreach (var item in doc.DocumentNode.SelectNodes("//td[#class='c1']"))
{
output += item.OuterHtml;
}
File.WriteAllText("AirportData.txt", output);
note that you need to specify a path to a writable folder, the root folder of the app is not writable. See https://learn.microsoft.com/en-us/xamarin/xamarin-forms/data-cloud/data/files?tabs=windows
Currently, I use this code to extract text from a Rectangle (area).
public static class ReaderExtensions
{
public static string ExtractText(this PdfPage page, Rectangle rect)
{
var filter = new IEventFilter[1];
filter[0] = new TextRegionEventFilter(rect);
var filteredTextEventListener = new FilteredTextEventListener(new LocationTextExtractionStrategy(), filter);
var str = PdfTextExtractor.GetTextFromPage(page, filteredTextEventListener);
return str;
}
}
It works, but I don't know if it's the best way to do it.
Also, I wonder if the GetTextFromPage could be improved by the iText team to increase its performance, since I'm processing hundreds of pages in big PDFs and it usually takes more than 10 minutes to do it using my current configuration.
EDIT:
From the comments: It seems that iText can extract the text of multiple rectangles on the same page in one pass, something that can improve the performance (batched operations tend to be more efficient), but how?
MORE DETAILS!
My goal is to extract data from a PDF with multiple pages. Each page has the same layout: a table with rows and columns.
Currently, I'm using the method above to extract the text of each rectangle. But, as you see, the extraction isn't batched. It's only a rectangle at a time. How could I extract all the rectangles of a page in a single pass?
As already mentioned in a comment, I was surprised to see that the iText 7 LocationTextExtractionStrategy does not anymore contain something akin to the iText 5 LocationTextExtractionStrategy method GetResultantText(TextChunkFilter). This would have allowed you to parse the page once and extract text from text pieces in arbitrary page areas out of the box.
But it is possible to bring back that feature. One option for this would be to add it to a copy of the LocationTextExtractionStrategy. This would be kind of a long answer here, though. So I used another option: I use the existing LocationTextExtractionStrategy, and merely for the GetResultantText call I manipulate the underlying list of text chunks of the strategy. Instead of a generic TextChunkFilter interface I restricted filtering to the criteria at hand, the filtering by rectangular area.
public static class ReaderExtensions
{
public static string[] ExtractText(this PdfPage page, params Rectangle[] rects)
{
var textEventListener = new LocationTextExtractionStrategy();
PdfTextExtractor.GetTextFromPage(page, textEventListener);
string[] result = new string[rects.Length];
for (int i = 0; i < result.Length; i++)
{
result[i] = textEventListener.GetResultantText(rects[i]);
}
return result;
}
public static String GetResultantText(this LocationTextExtractionStrategy strategy, Rectangle rect)
{
IList<TextChunk> locationalResult = (IList<TextChunk>)locationalResultField.GetValue(strategy);
List<TextChunk> nonMatching = new List<TextChunk>();
foreach (TextChunk chunk in locationalResult)
{
ITextChunkLocation location = chunk.GetLocation();
Vector start = location.GetStartLocation();
Vector end = location.GetEndLocation();
if (!rect.IntersectsLine(start.Get(Vector.I1), start.Get(Vector.I2), end.Get(Vector.I1), end.Get(Vector.I2)))
{
nonMatching.Add(chunk);
}
}
nonMatching.ForEach(c => locationalResult.Remove(c));
try
{
return strategy.GetResultantText();
}
finally
{
nonMatching.ForEach(c => locationalResult.Add(c));
}
}
static FieldInfo locationalResultField = typeof(LocationTextExtractionStrategy).GetField("locationalResult", BindingFlags.NonPublic | BindingFlags.Instance);
}
The central extension is the LocationTextExtractionStrategy extension which takes a LocationTextExtractionStrategy which already contains the information from a page, restricts these information to those in a given rectangle, extracts the text, and returns the information to the previous state. This requires some reflection; I hope that is ok for you.
What's the best way to detect the language of a string?
If the context of your code have internet access, you can try to use the Google API for language detection.
http://code.google.com/apis/ajaxlanguage/documentation/
var text = "¿Dónde está el baño?";
google.language.detect(text, function(result) {
if (!result.error) {
var language = 'unknown';
for (l in google.language.Languages) {
if (google.language.Languages[l] == result.language) {
language = l;
break;
}
}
var container = document.getElementById("detection");
container.innerHTML = text + " is: " + language + "";
}
});
And, since you are using c#, take a look at this article on how to call the API from c#.
UPDATE:
That c# link is gone, here's a cached copy of the core of it:
string s = TextBoxTranslateEnglishToHebrew.Text;
string key = "YOUR GOOGLE AJAX API KEY";
GoogleLangaugeDetector detector =
new GoogleLangaugeDetector(s, VERSION.ONE_POINT_ZERO, key);
GoogleTranslator gTranslator = new GoogleTranslator(s, VERSION.ONE_POINT_ZERO,
detector.LanguageDetected.Equals("iw") ? LANGUAGE.HEBREW : LANGUAGE.ENGLISH,
detector.LanguageDetected.Equals("iw") ? LANGUAGE.ENGLISH : LANGUAGE.HEBREW,
key);
TextBoxTranslation.Text = gTranslator.Translation;
Basically, you need to create a URI and send it to Google that looks like:
http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=hello%20worled&langpair=en%7ciw&key=your_google_api_key_goes_here
This tells the API that you want to translate "hello world" from English to Hebrew, to which Google's JSON response would look like:
{"responseData": {"translatedText":"שלום העולם"}, "responseDetails": null, "responseStatus": 200}
I chose to make a base class that represents a typical Google JSON response:
[Serializable]
public class JSONResponse
{
public string responseDetails = null;
public string responseStatus = null;
}
Then, a Translation object that inherits from this class:
[Serializable]
public class Translation: JSONResponse
{
public TranslationResponseData responseData =
new TranslationResponseData();
}
This Translation class has a TranslationResponseData object that looks like this:
[Serializable]
public class TranslationResponseData
{
public string translatedText;
}
Finally, we can make the GoogleTranslator class:
using System;
using System.Collections.Generic;
using System.Text;
using System.Web;
using System.Net;
using System.IO;
using System.Runtime.Serialization.Json;
namespace GoogleTranslationAPI
{
public class GoogleTranslator
{
private string _q = "";
private string _v = "";
private string _key = "";
private string _langPair = "";
private string _requestUrl = "";
private string _translation = "";
public GoogleTranslator(string queryTerm, VERSION version, LANGUAGE languageFrom,
LANGUAGE languageTo, string key)
{
_q = HttpUtility.UrlPathEncode(queryTerm);
_v = HttpUtility.UrlEncode(EnumStringUtil.GetStringValue(version));
_langPair =
HttpUtility.UrlEncode(EnumStringUtil.GetStringValue(languageFrom) +
"|" + EnumStringUtil.GetStringValue(languageTo));
_key = HttpUtility.UrlEncode(key);
string encodedRequestUrlFragment =
string.Format("?v={0}&q={1}&langpair={2}&key={3}",
_v, _q, _langPair, _key);
_requestUrl = EnumStringUtil.GetStringValue(BASEURL.TRANSLATE) + encodedRequestUrlFragment;
GetTranslation();
}
public string Translation
{
get { return _translation; }
private set { _translation = value; }
}
private void GetTranslation()
{
try
{
WebRequest request = WebRequest.Create(_requestUrl);
WebResponse response = request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
string json = reader.ReadLine();
using (MemoryStream ms = new MemoryStream(Encoding.Unicode.GetBytes(json)))
{
DataContractJsonSerializer ser =
new DataContractJsonSerializer(typeof(Translation));
Translation translation = ser.ReadObject(ms) as Translation;
_translation = translation.responseData.translatedText;
}
}
catch (Exception) { }
}
}
}
Fast answer: NTextCat (NuGet, Online Demo)
Long answer:
Currently the best way seems to use classifiers trained to classify piece of text into one (or more) of languages from predefined set.
There is a Perl tool called TextCat. It has language models for 74 most popular languages. There is a huge number of ports of this tool into different programming languages.
There were no ports in .Net. So I have written one: NTextCat on GitHub.
It is pure .NET Framework DLL + command line interface to it. By default, it uses a profile of 14 languages.
Any feedback is very appreciated!
New ideas and feature requests are welcomed too :)
Alternative is to use numerous online services (e.g. one from Google mentioned, detectlanguage.com, langid.net, etc.).
A statistical approach using digraphs or trigraphs is a very good indicator. For example, here are the most common digraphs in English in order: http://www.letterfrequency.org/#digraph-frequency (one can find better or more complete lists). This method may have a better success rate than word analysis for short snippets of text because there are more digraphs in text than there are complete words.
If you mean the natural (ie human) language, this is in general a Hard Problem. What language is "server" - English or Turkish? What language is "chat" - English or French? What language is "uno" - Italian or Spanish (or Latin!) ?
Without paying attention to context, and doing some hard natural language processing (<----- this is the phrase to google for) you haven't got a chance.
You might enjoy a look at Frengly - it's a nice UI onto the Google Translate service which attempts to guess the language of the input text...
Make a statistical analyses of the string: Split the string into words. Get a dictionary for every language you want to test for. And then find the language that has the highest word count.
In C# every string in memory will be unicode, and is not encoded. Also in text files the encoding is not stored. (Sometimes only an indication of 8-bit or 16-bit).
If you want to make a distinction between two languages, you might find some simple tricks. For example if you want to recognize English from Dutch, the string that contains the "y" is mostly English. (Unreliable but fast).
CLD3 (Compact Language Detector v3) library from Google's Chromium browser
You could wrap the CLD3 library, which is written in C++.
We can use Regex.IsMatch(text, "[\\uxxxx-\\uxxxx]+") to detect an specific language. Here xxxx is the 4 digit Unicode id of a character.
To detect Arabic:
bool isArabic = Regex.IsMatch(yourtext, #"[\u0600-\u06FF]+")
You may use the C# package for language identification from Microsoft Research:
This package implements several algorithms for language
identification, and includes two sets of pre-compiled language
profiles. One set covers 52 languages and was trained on Wikipedia
(i.e. a well-written corpus); the other covers 26 languages and was
constructed from Twitter (i.e. a highly colloquial corpus). The
language identifiers are packaged up as a C# library, and be easily
embedded into other C# projects.
Download the package from the above link.
One alternative is to use 'Translator Text API' which is
... part of the Azure Cognitive Services API collection of machine
learning and AI algorithms in the cloud, and is readily consumable in
your development projects
Here's a quickstart guide on how to detect language from text using this API
Using C#, I need to pull data from a word document. I have NetOffice for word installed in the project. The data is in two parts.
First, I need to pull data from the document settings.
Second, I need to pull the content of controls in the document. The content of the fields includes checkboxes, a date, and a few paragraphs. The input method is via controls, so there must be some way to interact with the controls via the api, but I don't know how to do that.
right now, I've got the following code to pull the flat text from the document:
private static string wordDocument2String(string file)
{
NetOffice.WordApi.Application wordApplication = new NetOffice.WordApi.Application();
NetOffice.WordApi.Document newDocument = wordApplication.Documents.Open(file);
string txt = newDocument.Content.Text;
wordApplication.Quit();
wordApplication.Dispose();
return txt;
}
So the question is: how do I pull the data from the controls from the document, and how do I pull the document settings (such as the title, author, etc. as seen from word), using either NetOffice, or some other package?
I did not bother to implement NetOffice, but the commands should mostly be the same (except probably for implementation and disposal methods).
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
string file = "C:\\Hello World.docx";
Microsoft.Office.Interop.Word.Document doc = word.Documents.Open(file);
// look for a specific type of Field (there are about 200 to choose from).
foreach (Field f in doc.Fields)
{
if (f.Type == WdFieldType.wdFieldDate)
{
//do something
}
}
// example of the myriad properties that could be associated with "document settings"
WdProtectionType protType = doc.ProtectionType;
if (protType.Equals(WdProtectionType.wdAllowOnlyComments))
{
//do something else
}
The MSDN reference on Word Interop is where you will find information on just about anything you need access to in a Word document.
UPDATE:
After reading your comment, here are a few document settings you can access:
string author = doc.BuiltInDocumentProperties("Author").Value;
string name = doc.Name; // this gives you the file name.
// not clear what you mean by "title"
As far as trying to understand what text you are getting from a "legacy control", I need more information as to exactly what kind of control you are extracting from. Try getting a name of the control/textbox/form/etc from within the document itself and then look up that property on the Google.
As a stab in the dark, here is an (incomplete) example of getting text from textboxes in the document:
List<string> textBoxText = new List<string>();
foreach (Microsoft.Office.Interop.Word.Shape s in doc.Shapes)
{
textBoxText.Add(s.TextFrame.TextRange.Text); //this could result in an error if there are shapes that don't contain text.
}
Another possibility is Content Controls, of which there are several types. They are often used to gather user input.
Here is some code to catch a rich text Content Control:
List<string> contentControlText = new List<string>();
foreach(ContentControl CC in doc.ContentControls)
{
if (CC.Type == WdContentControlType.wdContentControlRichText)
{
contentControlText.Add(CC.Range.Text);
}
}
I am currently developing an Excel macro which allows creating Bugs in a Bugzilla instance.
After some trial and error this now turns out to work fine.
I wanted to enhance the client so that it's also possible to add screenshots to the newly created bug.
The environment I'm using is a little bit tricky:
I have to use MS Excel for my task.
As Excel does not understand XML-RPC, I downloaded an interface DLL (CookComputing.XmlRpcV2.dll from xml-rpc.net) which makes the XML-RPC interface accessible from .NET.
Then I created an additional DLL which can be called from Excel macros (using COM interop).
As already mentioned, this is working fine for tasks like browsing or adding new bugs.
But when adding an attachment to the bug, the image must be converted into a base64 data type. Although this seems to work fine and although the creation of the screenshot seems to succeed, the image seems to be corrupted and cannot be displayed.
Here's what I do to add the image:
The Bugzilla add_attachment method accepts a struct as input:
http://www.bugzilla.org/docs/4.0/en/html/api/Bugzilla/WebService/Bug.html#add_attachment.
This type was defined in C# and is visible also in VBA.
This is the struct definition:
[ClassInterface(ClassInterfaceType.AutoDual)]
public class TAttachmentInputData
{
public string[] ids;
public string data; // base64-encoded data
public string file_name;
public string summary;
public string content_type;
public string comment;
public bool is_patch;
public bool is_private;
public void addId(int id)
{
ids = new string[1];
ids[0] = id.ToString();
}
public void addData(string strData)
{
try
{
byte[] encData_byte = new byte[strData.Length];
encData_byte = System.Text.Encoding.ASCII.GetBytes(strData);
string encodedData = Convert.ToBase64String(encData_byte);
data = new Byte[System.Text.Encoding.ASCII.GetBytes(encodedData).Length];
data = System.Text.Encoding.ASCII.GetBytes(encodedData);
}
catch (Exception e)
{
throw new Exception("Error in base64Encode" + e.Message);
}
}
This is the part in my macro where I would like to add the attachment:
Dim attachmentsStruct As New TAttachmentInputData
fname = attachmentFileName
attachmentsStruct.file_name = GetFilenameFromPath(fname)
attachmentsStruct.is_patch = False
attachmentsStruct.is_private = False
'multiple other definitions
Open fname For Binary As #1
attachmentsStruct.addData (Input(LOF(1), #1))
Close #1
attachmentsStruct.file_name = GetFilenameFromPath(fname)
Call BugzillaClass.add_attachment(attachmentsStruct)
Where BugzillaClass it the interface exposed from my DLL to Excel VBA.
The method add_attachment refers to the XML-RPC method add_attachment.
I assume that my problem is the conversion from the binary file into base64.
This is done using the addData method in my C# DLL.
Is the conversion done correctly there?
Any idea why the images are corrupted?
I think the issue is that you are reading in binary data in the macro, but the addData method is expecting a string. Try declaring the parameter in addData as byte[].