using linq for this: - c#

i have just started learning linq because i like the sound of it. and so far i think im doing okay at it.
i was wondering if Linq could be used to find the following information in a file, like a group at a time or something:
Control
Text
Location
Color
Font
Control Size
example:
Label
"this is text that will
appear on a label control at runtime"
23, 77
-93006781
Tahoma, 9.0, Bold
240, 75
The above info will be in a plain file and wil have more than one type of control and many different sizes, font properties etc associated with each control listed. is it possible in Linq to parse the info in this txt file and then convert it to an actual control?
i've done this using a regex but regex is too much of a hassle to update/maintain.
thanks heaps
jase
Edit:
Since XML is for structured data, would Linq To XML be appropriate for this task? And would you please share with me any helpful/useful links that you may have? (Other than MSDN, because I am looking at that now. :))
Thank you all

If you are generating this data yourself, then I HIGHLY recommend you store this in an XML file. Then you can use XElement to parse this.
EDIT: This is exactly the type of thing that XML is designed for, structured data.
EDIT EDIT: In response to the second question, Linq to XML is exactly what your looking for:
For an example, here is a couple of links to code I have written that parses XML using XElements. It also creates a XML document.
Example 1 - Loading and Saving: have a look under the FromXML() and ToXML() methods.
Example 2 - Parsing a large XML doc: have a look under the ParseXml method.
Hope these get you going :D

LINQ is good for filtering off rows, selecting relevant columns etc.
Even if you use LINQ for this, you will still need regex to select the relevant text and do the parsing.

Related

How to Read a particular Data from XML file and Write it to an Existing Excel Sheet using c#

i have an xml file, i have some values like good, and bad, with Tag Quality. I want to read the xml file and print the ones which are Bad, in the Excel sheet which is existing. Can anyone help me Please. My XML file looks same as Below. So in that text i want to write entire HYDR. instrument id, HYDR.Quality" Only for Bad Values in HYDR.Quality Element.
<HYDR.Instrument id="ABR">
<HYDR.Quality>Good</HYDR.Quality>
<HYDR.Value>0</HYDR.Value>
</HYDR.Instrument>
<HYDR.Instrument id="ABR_DUMMY">
<HYDR.Quality>Bad</HYDR.Quality>
<HYDR.Value>0</HYDR.Value>
</HYDR.Instrument>
<HYDR.Instrument id="ABR_LOOP_JP">
<HYDR.Quality>Good</HYDR.Quality>
<HYDR.Value>15.208 kg/cm2g</HYDR.Value>
</HYDR.Instrument>
<HYDR.Instrument id="ABR_MOV_12">
<HYDR.Quality>Good</HYDR.Quality>
<HYDR.Value>0</HYDR.Value>
</HYDR.Instrument>
Basically you need to use two libraries to get to the answer you want:
First you need to load the XML file, so I suggest using the Linq library, you can start from here
Then, you need to write the filtered XML elements to excel, I suggest to use the aspose library, you can start learning from here
Using these two libraries, you can achieve what you want.

C# Parse text file

I am trying to parse a file in MVC C#, see the format below. Since its not in JSON I cannot use the Javascript serializer to deserialize to an object. The other option is use to LINQ and read line by line and retrieve the desired values. Could any one recommend a more efficient way to do it.
The first field I need to retrieve is the ASSAY NUMBER (for example value 877) from ASSAYS
and then the ASSAY_STATUS field from TEST_REPLICATE which could be multiple nodes. Thanks
LOAD_HEADER
{
EXPERIMENT_FILE_NAME "xyz.json"
EXPERIMENT_START_DATE_TIME 05.21.2012 03:44:01
OPERATOR_ID "Q_SI"
}
ASSAYS
{
ASSAY_NUMBER 877
ASSAY_VERSION 4
ASSAY_CALIBRATION_VERSION 1
}
TEST_REPLICATE
{
REPLICATE_ID 1985
ASSAY_NUMBER 877
ASSAY_VERSION 4
ASSAY_STATUS Research
}
TEST_REPLICATE
{
REPLICATE_ID 1985
ASSAY_NUMBER 877
ASSAY_VERSION 4
ASSAY_STATUS Research
}
You could either hack something together or use a parser generator like ANTLR or Coco/R. Both can generate parsers in C#.
I'm more fond of using a parser-combinator (a tool for constructing parsers using parser building blocks) than parser generators. I've had passable experience with Piglet, which is written with/for C#, and is pretty easy to use, and amazing experience with FParsec, but it's written for F#.
As far as parser generators go, there are those suggested by stmax, and there is also TinyPG, which a member recommended me once.
You can also roll your own parser. I suggest basing it on some sort of state machine model, though in this simple case, like Kirk Woll suggested, you could probably get by with some plain old string manipulation.
I think the answer to this hinges upon whether or not there will ever be more than one ASSAY_NUMBER value in the file. If so, the easiest and surest way I know is to read the file line-by-line and get the data you desire.
If, however, you know that each file is unique to a specific ASSY_NUMBER, you have a much simpler answer: read the file as one string and use REGEX to pull out the information you desire. I am not an expert on REGEX, but there are enough examples online that you should be able to create one that works.

Fillable doc files

I have a samples of some documents in .doc format. So I need to create some "fillable# areas instead of certain values in samples. Then I need to automatically fill this documents using C#. So what do you think about it? Is that possible? Thanks in advance, guys! P.S.: if you need some information from me please feel free to ask me about additions to my question.
Besides simply injecting/replacing text into the document itself you could also utilize docvariables. You can define/create them in your document and then you can codewise set the values.
Using docvariables you seperate the design of the worddoc (where is the text shown) from setting the values which might be usefull for your case.
You can certainly manipulate them using C# but a bit more info using a vba sample can found at What is a DOCVARIABLE in word
One little warning when using c# to edit them. If you set the value of a docvariable to "" (empty string) it results in the docvariable being deleted from the document. If you want to keep the docvariable around set it's value to a " " (space)
Yes this is possible, you can create in your Document a placeholder areas which you search and change when you access the file. Check these results on how to modify the word document using C#

Put prefix on all elements of xml document

I'm using C# and i need to create a XML document. Ok, i did, but, in each element i need to put a tc prefix.
The only way that i know, is using xmlDoc.CreateElement("tc", "node1", "file.xsd"), but it is very massive because i have lots of tags and my program its already writted.
Is this the only way?
This might work for you:
XmlReader - I need to edit an element and produce a new one
If you're lucky enough to be using C# 3.5, take a look at LINQ to XML.
Here's a document on How to: Create a Document with Namespaces (C#) (LINQ to XML) from MSDN for the LINQ to XML API.
And if you've never seen LINQ to XML before, take a look at this 5 minute overview

Suggestions on how build an HTML Diff tool?

In this post I asked if there were any tools that compare the structure (not actual content) of 2 HTML pages. I ask because I receive HTML templates from our designers, and frequently miss minor formatting changes in my implementation. I then waste a few hours of designer time sifting through my pages to find my mistakes.
The thread offered some good suggestions, but there was nothing that fit the bill. "Fine, then", thought I, "I'll just crank one out myself. I'm a halfway-decent developer, right?".
Well, once I started to think about it, I couldn't quite figure out how to go about it. I can crank out a data-driven website easily enough, or do a CMS implementation, or throw documents in and out of BizTalk all day. Can't begin to figure out how to compare HTML docs.
Well, sure, I have to read the DOM, and iterate through the nodes. I have to map the structure to some data structure (how??), and then compare them (how??). It's a development task like none I've ever attempted.
So now that I've identified a weakness in my knowledge, I'm even more challenged to figure this out. Any suggestions on how to get started?
clarification: the actual content isn't what I want to compare -- the creative guys fill their pages with lorem ipsum, and I use real content. Instead, I want to compare structure:
<div class="foo">lorem ipsum<div>
is different that
<div class="foo"><p>lorem ipsum<p><div>
The DOM is a data structure - it's a tree.
Run both files through the following Perl script, then use diff -iw to do a case-insensitive, whitespace-ignoring diff.
#! /usr/bin/perl -w
use strict;
undef $/;
my $html = <STDIN>;
while ($html =~ /\S/) {
if ($html =~ s/^\s*<//) {
$html =~ s/^(.*?)>// or die "malformed HTML";
print "<$1>\n";
} else {
$html =~ s/^([^<]+)//;
print "(text)\n";
}
}
#Mike - that would compare everything, including the content of the page, which isn't want the original poster wanted.
Assuming that you have access to the browser's DOM (by writing a Firefox/IE plugin or whatever), I would probably put all of the HTML elements into a tree, then compare the two trees. If the tag name is different, then the node is different. You might want to stop enumerating at a certain point (you probably don't care about span, bold, italic, etc. - maybe only worry about divs?), since some tags are really the content, rather than the structure, of the page.
If i was to tacke this issue I would do this:
Plan for some kind of a DOM for html pages. starts at lightweight and then add more as needed. I would use composite pattern for the data structure. i.e. every element has children collection of the base class type.
Create a parser to parse html pages.
Using the parser load html element to the DOM.
After the pages' been loaded up to the DOM, you have the hierachical snapshot of your html pages structure.
Keep iterating through every element on both sides till the end of the DOM. You'll find the diff in the structure, when you hit a mismatched of element type.
In your example you would have only a div element object loaded on one side, on the other side you would have a div element object loaded with 1 child element of type paragraph element. fire up your iterator, first you'll match up the div element, second iterator you'll match up paragraph with nothing. You've got your structural difference.
I think some of the suggestions above don't take into account that there are other tags in the HTML between two pages which would be textually different, but the resulting HTML markup is functionally equivalent. Danimal lists control IDs as an example.
The following two markups are functionlly identical, but would show up as different if you simply compared tags:
<div id="ctl00_TopNavHome_DivHeader" class="header4">foo</div>
<div class="header4">foo</div>
I was going to suggest Danimal write an HTML translation which looks for the HTML tags and converts both docs into a simplified version of both which omits ID tags and any other tags you designate as irrelevant. This’d likely have to be a work in progress, as you ignore certain attributes/tags and then run into new ones which you also want to ignore.
However, I like the idea of using the XmlSchemaInterface to boil it down to the XML schema, then use a diff tool which understands XML rules.
See http://www.semdesigns.com/Products/SmartDifferencer/index.html for a tool that is parameterized by langauge grammar, and produces deltas in terms of language elements (identifiers, expressions, statements, blocks, methods, ...) inserted, deleted, moved, replaced, or has identifiers substituted across it consistently. This tool ignores whitespace reformatting (e.g., different linebreaks or layouts) and semantically indistinguishable values (e.g., it knows that 0x0F and 15 are the same value).
This can be applied to HTML using an HTML parser.
EDIT: 9/12/2009. We've built an experimental SmartDiff tool using an HTML editor.
http://www.mugo.ca/Products/Dom-Diff
Works with FF 3.5. I haven't tested FF 3.6 yet.
This has been an excellent start. A few more clarifications/comments:
I probably don't care about IDs, since .net will mangle them
some of the structure will be in a repeater or other such control, so I might end up having more or fewer repeating elements
further thought:
I think a good start would be to assume the html is XHTML compliant. I could then infer the schema (using the new .net XmlSchemaInference methods), then diff the schemata. I can then look at the differences and consider whether or not they're significant.
My suggestion is just the basic way of doing it... Of course to tackle the issue you mentioned additional rules must be applied here... Which is in your case, we got a matching div element, and then apply attributes/property matching rules and what not...
To be honest, there are many and complicated rules that need to be applied for the comparison, and its not just a simple matching element to another element. For example what happens if you have duplicates.
e.g. 1 div element on one side, and 2 div element on the other side. How are you gonna match up which div elements matches together?
There are alot other complicated issues that you will find in the comparison word. Im speaking based of experience (part of my job is to maitain my company text comparison engine).
Take a look at beyond compare. It has an XML comparison feature that can help you out.
You may also have to consider that the 'content' itself could contain additional mark-up so it's probably worth stripping out everything within certain elements (like <div>s with certain IDs or classes) before you do your comparison. For example:
<div id="mainContent">
<p>lorem ipsum etc..</p>
</div>
and
<div id="mainContent">
<p>Here is some real content<img class="someImage" src="someImage.jpg" /></p>
<ul>
<li>and</li>
<li>some</li>
<li>more..</li>
</ul>
</div>
Pretty Diff can do this. It will compare the code structure only regardless of differences to white space, comments, or even content. Just be sure to check the option "Normalize Content and String Literals".
http://prettydiff.com/
I would use (or contribute to) html5lib and its SAX output. Just zip through the 2 SAX streams looking for mismatches and highlight the whole corresponding subtree.
I don't know any tool but I know there is a simple way to do this:
First, use a regular expression tool to strip off all the text in your HTML file. You can use this regular expression to search for the text (?<=^|>)[^><]+?(?=<|$) and replace them with an empty string (""), i.e. delete all the text. After this step, you will have all HTML markup tags. There are a lot of free regular expression tools out there.
Then, you repeat the first step for the original HTML file.
Last, you use a diff tool to compare the two sets of HTML markups. This will show what is missing between one set and the other.
If i were to do this, first i would learn HTML. (^-^) Then i would build a tool that strips out all of the actual content and then saves that as a file so it can be piped through WinDiff (or other merge tool).
Open each page in the browser and save them as .htm files. Compare the two using windiff.

Categories