have to extract data from a word file [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have a peculiar problem in that I have to extract information from a word file. Say for example I have a resume and need to extract name, email address, phone no., address, university,Experience etc.
Every other person may be having their resume in a different format.So is there any way by which I can programmatically extract the information I need?
I need this information to fill-up a form for registration.

Even if at first you might be attracted by the idea of using Com Interop and Asp.net, don't do it.
http://support.microsoft.com/kb/257757
That said, it's important to know which version of word are we talking about. Newer formats allow treat them as a zip containing xml files and there are good&free libraries.
http://docx.codeplex.com/

Convert the word document to html, with aspose .net.
Then you can use regular expressions to search the word and/or pdf documents.
Or you can use HTMLAgilityPack to parse the created HTML documents, and search for specific sections/paths.
PS:
If you have a regex for email that's shorter than one page, then the regex is incorrect.
Phone should be manageable, as long as you have to support only one country.
As for name and address, good luck with that.
Edit:
Like this
VB.NET:
Dim doc As New Aspose.Words.Document("filename.docORdocx")
doc.Save("filename.html", Aspose.Words.SaveFormat.Html)
C#:
Aspose.Words.Document doc = new Aspose.Words.Document("filename.docORdocx");
doc.Save("filename.html", Aspose.Words.SaveFormat.Html);
The component is here:
http://www.aspose.com/.net/word-component.aspx
To find out what a valid email address is, read RFC 822:
http://www.faqs.org/rfcs/rfc822.html

Related

reading lines with variable numbers of fields from a file in c# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
How do you read the lines one by one if their fields are uneven and you need to know when it ends.
For example:
A;B;C;D
E;F;G;H;J
'A' is a person and 'B' , 'C' and 'D' are his friends.It goes the same for the second line I wrote.I know I could just write it with an even number of fields but I think this is a neater way to do it.
Thank you.
There are two functions that make this really easy: StreamReader.ReadLine and String.Split.
You use StreamReader.ReadLine to get the entire line of text:
string lineOfInput = reader.ReadLine();
Then, you can split on the semicolons to get all your "fields":
string[] fields = lineOfInput.Split(';');
fields[0] will contain the "person" and the rest, his "friends".
See StreamReader and Split on MSDN.
CSV files are deceptively simple. See this question, Simple csv reader?, for some details.
But I'd just use Sebastien Lorion's Fast CSV Reader from CodeProject:
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
All you've got to do is set the correct field separator and you're good to go. It's fast, it works well. It implements IDataReader, so it acts like a normal .Net data reader implementation.

Parse HTML With C# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I'd like to parse html page using C#. There are html pages which contain a lot of html tags, here's a sample of one of them :
<span class=text14 id="article_content"><!-- RELEVANTI_ARTICLE_START --><span ></b>The
most important component for <a
class=bluelink href="http://www.ynetnews.com/articles/0,7340,L-
3284752,00.html%20"' onmouseover='this.href=unescape(this.href)'
target=_blank>Israel</a>'s
security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the <a ...
but I'd only like to get the content wrapped by the <span class=text14 id="article_content"> tag.
At first I've thought about using preg match, but then realized it's not efficient at all.
I've later read about Html Agility Pack and FizzlerEx -
i'd like to know whether it's possible to get the text wrapped by the specific tag i've mentioned using these tools, and i'd be grateful if someone could tell me how fast this task could be performed.
It's pretty straight forward using Html Agility Pack:
var markup = #"<span class=text14 id=""article_content""><!-- RELEVANTI_ARTICLE_START --><span ></b>The most important component for <a class=bluelink href=""http://www.ynetnews.com/articles/0,7340,L-3284752,00.html%20""' onmouseover='this.href=unescape(this.href)' target=_blank>Israel</a>'s security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the</span>";
var doc = new HtmlDocument();
doc.LoadHtml(markup);
var content = doc.GetElementbyId("article_content").InnerText;
Console.WriteLine(content);

I need to extract the url inside the string [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I need to extract the url inside the string.
In my case html text is in the db and when i get that text and need to find all url in the text and insert in to another table, can u give me a way to find the url's in SQL or C#.
This is reqular expression to find urls in text
Regex regx = new Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);
MatchCollection mactches = regx.Matches(txt);
One of the possible ways to do it is by using Regular expressions. First option is to extract HTML from the DB, then use Regular Expression to find the links directly. The second option is to locate link tags first, then extract url from them (again by using Regular expressions).
Here you can find information about how to use Regular Expressions in C#:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
On the other hand, writing the correct Regular Expression may not be so easy (it depends on how complex the URL is), but you should take a look at this question: regular expression for url
Also, here you can find a lot of information about regular expressions in general (keep in mind that there are some applications like RegexBuddy, that can help you a lot when it comes to testing your regular expressions): http://www.regular-expressions.info/

.Net how to play audio in the root of the website with System.Windows.Media.MediaPlayer() [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I use this code blow in .NET. It works fine. The problem is that I want this audio to play in the root of the website. What changes should I make for this? Thanks
var sample= new System.Windows.Media.MediaPlayer();
sample.Open(new System.Uri( #"D:\voices\1.wav");
sample.Play();
In a web application, this might look something like this:
sample.Open(new System.Uri(Server.MapPath("~/") + #"\voices\1.wav");
I say might because that all depends on whether or not the voices folder exists in the root of the website. Additionally, you should probably leverage Path.Combine instead:
var path = Path.Combine(Server.MapPath("~/"), "voices", "1.wav");
sample.Open(path);
Finally, I don't know what sample is, but the Open method may not work in a website. I'm making the assumption you know what Open does and whether or not it can work in a website.
Use server.mappath("~/") <- that's the filesystem root for your website.
dim path as string = server.mappath("~/") & "/voices/1.wav"
Note that backslashes, for filesystem path, not URI.
Hope it helps.

How do I print a check? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I need to write a .NET library for printing checks. Nothing fancy: you pass in the data, out comes the printed check. What's the best way to do this?
Constraints: The format of the check.
A lot of people are using report generators for this. It's a bit overkill, but crystal reports will certainly do the job.
Other than that, this is a basic question about formatting printed output. Is that your intention?
Check out the printdocument class and you can do this yourself:
http://msdn.microsoft.com/en-us/magazine/cc188767.aspx
If you're printing checks remotely (ie, you need to provide a check on the website that the user can print out) then using PDF is the easiest and most certain way to accomplish that, but be careful of the security implications.
-Adam
Wow... that takes me back! In the old days printers where dot matrix and cheques where a continous feed. I suppose nowadays cheques are preprinted single sheets and are printed with lasers/inkjets. Back then we'd just write plain ascii to the printer and send printer specific control/escape sequences for any specific formatting needs (picking the font size, line spacing, and page sizes).
Now I would like try generating a PDF and then submitting that file for printing. It out to be possible to do this with a plain text file too... though that's getting pretty close to old school. The report generator suggestion by Adam is pretty good idea too.
Generally with cheque printing it is a lot of trial and error to get the formatting right. Printing on plain paper and holding it and a preprinted cheque up to the window is an easy way to check positioning without burning through tons of cheques.
One thing to note though is whether or not there is a requirement to track the control numbers preprinted on the cheques (aka cheque number). Auditors sometimes require this and it is also a reasonable guard against fraud (accounting for every preprinted cheque is not a terrible idea). To do this you need to handle reprinting, and markng individual cheques/cheque runs as "spoiled". You also need a manual process to collect and store spoiled cheques (for the auditors). On whole it's a giant pain to get this right and can take more time than you might imagine.
Unless you're really ambitious, you order pre-printed checks and look at the check template. Fill in the blanks and there you are.
Since the format would be fairly fixed, I but you could create a Word doc that holds the format and then programmatically insert the correct information and print it
EDIT
Wow, pretty anti MS eh? You can use the full power of Words to visually set the format for the cheque and there are libraries to modify Word docs in .net, so I don't see why this isn't a slick solution

Categories