Hopefully you can help me as I do not find a solution neither on the web nor in my brain.
I am querying a issue-tracking-system (jira) via a webrequest. The systems answer is a json-file with a description of an issue represented by a string that has wiki-markdowns in it. It is possible to show this string 1:1 to the user. But I would prefer a solution to somehow parse the string and show the user not the textual markdown but the parsed elements like tables or numbered enumerations.
I use C# and currently I am showing the information in a richtextbox, but I guess richtextbox is not the element you choose for such a requirement.
For Example the following string is returned by the jira-system and I would like it to be shown as a "real" table and an enumeration to the user.
||criteria||status||
|concept 1|open|
|concept 2|open|
* topic 1
* topic 2
Hope you can help me
after long researches the answer is totally simple.
The Jira offers a conversion from markdown to html itself. When you query an issue via a URL just add ?expand=renderedFields to the URL like explained here https://community.atlassian.com/t5/Answers-Developer-Questions/How-can-I-get-the-rendered-HTML-of-a-wiki-markup-field-in-JIRA/qaq-p/495779
You will receive the answer like before and additional to that the html-writing of the answer. With that answer it is almost simple to show it in an webbrowser-element in the UI
Related
I am trying to write code (in C#) that can search for any plain-text word or phrase in a markdown file. Currently I'm doing this by a long-winded method: convert the markdown to HTML, strip HTML element tags out of the HTML text and then use a simple regular expression to search that for the word/phrase in question. Needless to say, this can be pretty slow.
A concrete example might show the problem. Say the markdown file contains
Something ***significant***
I would like to be able to find that by providing the search phrase something significant (i.e. ignoring the ***'s).
Is there an efficient way of doing this (i.e. that avoids the conversion to HTML) and doesn't involve me writing my own markdown parser?
Edit:
I want a generic way to search for any text or phrase in markdown text that contains any valid markdown formatting. The first answers were ways to match the specific text example I gave.
Edit:
I should have made it clear: this is required for a simple user-facing search and the markdown files could contain any valid markdown formatting. For this reason I need to be able to ignore anything in the markdown that the user wouldn't see as text if they converted the markdown to HTML. E.g. the markdown text that specifies an image (like ![Valid XHTML](http://w3.org/Icons/valid-xhtml10). should be skipped during the search). Converting to HTML produces decent results for the user because it then reasonably accurately reflects what a user sees (but it's just a slow solution, esp when there's a lot of markdown text to look through).
Use a regexp
var str = "Something ***significant***";
var regexp = new Regex("Something.+significant.+");
Console.WriteLine(regexp.Match(str).Success);
I want to do the same thing. I think of one way to achieve that.
Your method has two steps.
Get the plain text out of the markdown source (which has also two steps. Markdown->HTML and HTML->stripped to plain text)
Search within the plain text
Now, if the markdown source is persisted in a data store, then you may be able to also persist the plain text for search purposes only. So the step to extract the plain text from the markdown may be executed only once when persisting the markdown source (or every time the markdown source is updated), but the code that actually searches in the markdown could be executed immediately on the already persisted plain text data as many times as you want.
For example, if you have a relational DB with a column like markdown_text, you could also create a plain_text column and recreate its value every time the markdown_text column is changed.
Users won't bother if saving their markdown takes a few milliseconds (or even seconds) more than before. Users tend to feel safe when something that alters the system's state takes some time (they feel that something is actually happening in the system), rather than happen immediately (they feel that something went wrong and their command did not execute). But they will feel frustrated if searching took more than a few ms to complete. In general users want queries to complete immediately but commands to take some time (not more than a few seconds though).
Try this:
string input = "Something ***significant***";
string v = input.Replace("***", "");
Console.WriteLine(v)
look this example: enter link description here
I am currently working on an 8 Ball application for that uses a .ashx query string on the browser to receive numbers and return responses.
As of now, i am testing the string on my browser like so....
myapp/Modules/SMS/Services/ProcessEightBall.ashx?name=Mike&Session=123&Querystring=
Hello Mike, help me tell your destiny.
Choose:
1. Tell your Fate
2. Answer a Question
3. See into the Future.
basically, one adds a parameter to the Query string in response to the Question askede.g.
myapp/Modules/SMS/Services/ProcessEightBall.ashx?name=Mike&Session=1234&Querystring=1
Dear Mike,
You Shall Meet a Funny person and have a huge pizza for Lunch.
I need help on how to make a windows Forms application that can replace the Session and the QueryString using TextBoxes, and posts the string and receive the response on a section of the screen...of the 8ball :)
Well, you can still use the HttpUtility class in Windows Forms. So you can use Uri, get the query string, use HttpUtility.ParseQueryString to get all the values, change the ones you want, build the query string again (don't forget to encode the arguments as needed) and you're done.
i have just started learning linq because i like the sound of it. and so far i think im doing okay at it.
i was wondering if Linq could be used to find the following information in a file, like a group at a time or something:
Control
Text
Location
Color
Font
Control Size
example:
Label
"this is text that will
appear on a label control at runtime"
23, 77
-93006781
Tahoma, 9.0, Bold
240, 75
The above info will be in a plain file and wil have more than one type of control and many different sizes, font properties etc associated with each control listed. is it possible in Linq to parse the info in this txt file and then convert it to an actual control?
i've done this using a regex but regex is too much of a hassle to update/maintain.
thanks heaps
jase
Edit:
Since XML is for structured data, would Linq To XML be appropriate for this task? And would you please share with me any helpful/useful links that you may have? (Other than MSDN, because I am looking at that now. :))
Thank you all
If you are generating this data yourself, then I HIGHLY recommend you store this in an XML file. Then you can use XElement to parse this.
EDIT: This is exactly the type of thing that XML is designed for, structured data.
EDIT EDIT: In response to the second question, Linq to XML is exactly what your looking for:
For an example, here is a couple of links to code I have written that parses XML using XElements. It also creates a XML document.
Example 1 - Loading and Saving: have a look under the FromXML() and ToXML() methods.
Example 2 - Parsing a large XML doc: have a look under the ParseXml method.
Hope these get you going :D
LINQ is good for filtering off rows, selecting relevant columns etc.
Even if you use LINQ for this, you will still need regex to select the relevant text and do the parsing.
i'm generating HyperLinks, all of them (depending on the circunstance, could be 1, 2 or 1000) send to the same webform:
from default.aspx
to envia.aspx
i can't use session, or anything i already know, because i can't create as many methods i want (that would not be good, due to possible large numbers)
example, there are three lines i print on demand:
house [link]
car [link]
flower[link]
i want the three links to load the same aspx webform sending as a parameter a string with these lines.
i don't care if the answer is in vb.net or in c#, anything you could help it's ok (i'm using vb.net though)
can you use Query String?
envia.aspx?param1=something¶m2=somethingelse
in envia.aspx:
string param1 = Request["param1"];
string param2 = Request["param2"];
What about crosspage postbacks? Only used it once, but this sounds like a good candidate for it. See Cross-Page Posting in ASP.NET Web Pages, http://msdn.microsoft.com/en-us/library/ms178139.aspx
Does anyone have any suggestions as to how I can clean the body of incoming emails? I want to strip out disclaimers, images and maybe any previous email text that may be also be present so that I am left with just the body text content. My guess is it isn't going to be possible in any reliable way, but has anyone tried it? Are there any libraries geared towards this sort of thing?
In email, there is couple of agreed markings that mean something you wish to strip. You can look for these lines using regular expressions. I doubt you can't really well "sanitize" your emails, but some things you can look for:
Line starting with "> " (greater than then whitespace) marks a quote
Line with "-- " (two hyphens then whitespace then linefeed) marks the beginning of a signature, see Signature block on Wikipedia
Multipart messages, boundaries start with --, beyond that you need to do some searching to separate the message body parts from unwanted parts (like base64 images)
As for an actual C# implementation, I leave that for you or other SOers.
A few obvious things to look at:
if the mail is anything but pure plain text, the message will be multi-part mime. Any part whose type is "image/*" (image/jpeg, etc), can probably be dropped. In all likelyhood any part whose type is not "text/*" can go.
A HTML message will probably have a part of type "multipart/alternative" (I think), and will have 2 parts, one "text/plain" and one "text/html". The two parts should be just about equivalent, so you can drop the HTML part. If the only part present is the HTML bit, you may have to do a HTML to plain text conversion.
The usual format for quoted text is to precede the text by a ">" character. You should be able to drop these lines, unless the line starts ">From", in which case the ">" has been inserted to prevent the mail reader from thinking that the "From " is the start of a new mail.
The signature should start with "-- \r\n", though there is a very good chance that the trailing space will be missing.
Version 3 of OSBF-Lua has a mail-parsing library that will handle the MIME and split a message into its MIME parts and so on. I currently have a mess of Lua scripts that do
stuff like ignore most non-text attachments, prefer plain text to HTML, and so on. (I also wrap long lines to 80 characters while trying to preserve quoting.)
As far as removing previously quoted mail, the suggestions above are all good (you must subscribe to some ill-mannered mailing lists).
Removing disclaimers reliably is probably going to be hard. My first cut would be simply to maintain a library of disclaimers that would be stripped off the end of each mail message; I would write a script to make it easy for me to add to the library. For something more sophisticated I would try some kind of machine learning.
I've been working on spam filtering since Feb 2007 and I've learned that anything to do with email is a mess. A good rule of thumb is that whatever you want to do is a lot harder than you think it is :-(
Given your question "Is it possible to programmatically ‘clean’ emails?", I'd answer "No, not reliably".
The danger you face isn't really a technological one, but a sociological one.
It's easy enough to spot, and filter out, some aspects of the messages - like images. Filtering out signatures and disclaimers is, likewise, possible to achieve (though more of a challenge).
The real problem is the cost of getting it wrong.
What happens if your filter happens to remove a critical piece of the message? Can you trace it back to find the missing piece, or is your filtering desctructive? Worse, would you even notice that the piece was missing?
There's a classic comedy sketch I saw years ago that illustrates the point. Two guys working together on a car. One is underneath doing the work, the other sitting nearby reading instructions from a service manual - it's clear that neither guy knows what he's doing, but they're doing their best.
Manual guy, reading aloud: "Undo the bold in the centre of the oil pan ..." [turns page]
Tool guy: "Ok, it's out."
Manual guy: "... under no circumstances."
If you creating your own application i'd look into Regex, to find text and replace it. To make the application a little nice, i'd create a class Called Email and in that class i have a property called RAW and a property called Stripped.
Just some hints, you'll gather the rest when you look into regex!
SigParser has an assembly you can use in .NET. It gives you the body back in both HTML and text forms with the rest of the stuff stripped out. If you give it an HTML email it will convert the email to text if you need that.
var parser = new SigParser.EmailParsing.EmailParser();
var result = await parser.GetCleanedBodyAsync(new SigParser.EmailParsing.Models.CleanedBodyInput {
FromEmailAddress = "john.smith#example.com",
FromName = "John Smith",
TextBody = #"Hi Mark,
This is my message.
Thanks
John Smith
888-333-4434"
});
// This would print "Hi Mark,\r\nThis is my message."
Console.WriteLine(result.CleanedBodyPlain);