Limiting the length of an inline element - c#

How would you limit the length of a variable length text element which can contain text attributes (<b>, <i>, <sup>, ...) and links. The tags would need to be preserved both opening and closing, though an entire tag (both opening and close) could be removed if at appropriate position (cannot remove all tags to simplify the problem). I have c#, xslt, and css available to me. I would prefer not to do this with javascript.
For example:
On the <b>approximate realization</b> of continuous mappings by <i>neural networks</i> <a href='http://...very long link'>some text</a>...
Keep in mind that the tags themselves (and their attributes) should not count against the length.
Also, the text should wrap, so using width and overflow is out of the question.
Both Mrchief and Dimitre Novatchev have great solutions. I'm more fond of putting this logic in my xslt so I choose Dimitre Novatchev's as the answer, though both should be.

This XSLT 1.0 transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:key name="kTextById" match="text()" use="generate-id()"/>
<xsl:param name="pMaxLength" select="60"/>
<xsl:variable name="vTextToSplit">
<xsl:apply-templates select="(//text())[1]" mode="calc"/>
</xsl:variable>
<xsl:variable name="vsplitNode" select=
"key('kTextById', substring-before(substring-after($vTextToSplit,'|'), '|'))"/>
<xsl:variable name="vsplitLength" select=
"substring-before($vTextToSplit,'|')"/>
<xsl:variable name="vsplitPos" select=
"substring-after(substring-after($vTextToSplit,'|'),'|')"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:choose>
<xsl:when test="not($vTextToSplit)">
<xsl:copy-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="/node()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="text()" mode="calc">
<xsl:param name="paccumLength" select="0"/>
<xsl:variable name="vPos" select="count(preceding::text())+1"/>
<xsl:variable name="vnewAccumLength" select=
"$paccumLength+string-length()"/>
<xsl:choose>
<xsl:when test="$vnewAccumLength >= $pMaxLength">
<xsl:value-of select=
"concat(string-length() - ($vnewAccumLength -$pMaxLength),
'|', generate-id(),
'|', $vPos
)"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates mode="calc"
select="(//text())[position() = $vPos+1]">
<xsl:with-param name="paccumLength" select="$vnewAccumLength"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="text()">
<xsl:variable name="vPos" select="count(preceding::text())+1"/>
<xsl:choose>
<xsl:when test="$vPos > $vsplitPos"/>
<xsl:when test="$vPos = $vsplitPos">
<xsl:value-of select="substring(.,1,$vsplitLength)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
when applied on the provided input (wrapped into a single top element to make it a well-formed XML document):
<t>On the <b>approximate realization</b> of continuous mappings by <i>neural networks</i> <a href='http://...very long link'>some text</a>...</t>
produces the wanted, correct result -- a well-formed XML document that contains the elements of the source XML document and the total length of whose text nodes is equal exactly to the specified length (60) in the global parameter $pMaxLength:
<t>On the <b>approximate realization</b> of continuous mappings by <i>neu</i>
</t>
Explanation:
The global variable $vTextToSplit is calculated. It is a string, containing three values that are pipe-separated: the length in the "split node" that must be dropped off, the generate-id() od the "split node" and the ordinal position of the "split node" among all text nodes, in document order. The "split node" is this text node that contains the last character of the total string of text nodes to be generated.
The "split node, its "generate-id() and its length-to-be-trimmed are extracted from `$vTextToSplit" into three coresponding global variables.
The template matching the root (/) of the document checks for the edge case when the total length of the text nodes is less than the specified wanted length. If so, the complete XML document is copied to the output. If not so, the processing continues by applying templates to its child nodes.
The identity rule copies all nodes "as-is".
The template matching any text node overrides the identity template. It processes the matched text node in one of three ways: if this text node has smaller position than the "split node" it is copied entirely. If the matched node has position greater that the "split node" then its string value isn't copied. Finally, if this is the split node itself, all characters of its string value with the exception of the trailing $vsplitLength characters are copied.
II. XSLT 2.0 solution:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:param name="pMaxLength" select="60"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match=
"text()[not(sum((.|preceding::text())/string-length(.))
gt
$pMaxLength)
]">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match=
"text()[sum(preceding::text()/string-length(.))
gt
$pMaxLength
]"/>
<xsl:template match=
"text()[sum((.|preceding::text())/string-length(.))
ge
$pMaxLength
and
not(sum(preceding::text()/string-length(.))
gt
$pMaxLength)
]">
<xsl:variable name="vprevLength" select=
"sum(preceding::text()/string-length(.))"/>
<xsl:variable name="vremainingLength" select=
"$pMaxLength - $vprevLength"/>
<xsl:copy-of select="substring(.,1,$vremainingLength)"/>
</xsl:template>
</xsl:stylesheet>
when applied to the same source XML document (given above), the same correct result is produced:
<t>On the <b>approximate realization</b> of continuous mappings by <i>neu</i><a href="http://...very long link"/></t>
Note on performance: Both solutions presented will be slow for big XML documents. One way to avoid this is to use the scanl() function/template of FXSL. I will provide this third solution later, when I have more free time.

Here's an attempted solution:
public static string LimitText(string input, int width)
{
const string pattern = #"(</?[a-zA-Z0-9 '=://.]+>)";
var rgx = new Regex(pattern, RegexOptions.Compiled);
// remove tags and chop text to set width
var result = rgx.Replace(input, string.Empty).Substring(0, width);
// split till word boundary (so that "shittake" doesn't end up as "shit")
result = result.Substring(0, result.LastIndexOf(' '));
var matches = rgx.Matches(input);
// non LINQ version to keep things simple
foreach (Match match in matches)
{
var groups = match.Groups;
if (groups[0].Index > result.Length) break;
result = result.Insert(groups[0].Index, groups[0].Value);
}
// check for unbalanced tags
matches = rgx.Matches(result);
if (matches.Count % 2 != 0)
{
// chop off unbalanced tag
return result.Substring(0, matches[matches.Count-1].Groups[0].Index);
}
return result;
}
Caveats:
The regex matches all tags specifed in your post. You can expand upon it to include more characters based on your real scenario. However, parsing
HTML with Regex is always going to be tricky.
If your input string doesn't contain balanced tags (i.e. for each opening tag, there is a closing tag), this may not work as expected.
If you expect self closing tags (like <br />) or open input tags in your input string, then a pre-flight sanitization is needed. The idea is the same, get a group of matches, run through LimitText and re-insert these tags on the result string.
The final rendered text on browser may still not be satisfactory as font size or screen resolution may produce incorrect results. For this, you need to resort to JS based solution.
This should get you started and then you can expand upon it for any corner cases.

a sites design should never limit what information you can place within a container. you could set a max-height of the element and using css allow full height on hover - you'd have to use some positioning and possibly some negative margin to prevent other elements jumping about.
Alternatively you could make use of the text-overflow css property but this is not fully implemented yet (as far as I know)- strangely its supposedly supported in ie6> !
the regex solution is difficult - you need find the position of the end of text with tags stripped out, cut the remainder of the string with tags in and append any unclosed tags - a tricky one!

Related

Removing non-printing characters from XML text (or any string)

I'm getting an XML document back from a company and it has embedded tabs, newlines and other non-printing garbage in it. Is there some method in the framework that will take such a string and remove these unwanted characters? Some screenshots below, these are not debugger/visualiser artefacts as they are actually coming into play when I do string compares
Example #1:
Example #2:
FWIW these XML documents come from UTF8 encoding the response to a web request.
EDIT 2014-09-03 20:20 IST
In response to comments below from #CodeCaster I upload values (in the form of a NameValueCollection) using an instance of a WebClient. The response comes back to me and I do the following:
string reply = System.Text.Encoding.UTF8.GetString(response);
XmlNamespaceManager xmlNamespaceManager = new XmlNamespaceManager(new NameTable());
xmlNamespaceManager.AddNamespace("xsi", "http://www.w3.org/2001/XMLSchema-instance");
XmlDocument xmlDocument = new XmlDocument();
xmlDocument.LoadXml(reply);
It is this xmlDocument that has the offending characters throughout
That's a trivial task for XSLT.
This XSLT stylesheet normalizes (removes excessive whitespace from) all text nodes from the input XML document, leaving everything else untouched.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="node() | #*" />
</xsl:copy>
</xsl:template>
<xsl:template match="text()">
<xsl:value-of select="normalize-space()" />
</xsl:template>
</xsl:stylesheet>
Use the XslCompiledTransform class to apply it to your input XML.
Be aware that whitespace may sometimes carry meaning. Clobbering all of it might be counter-productive.
When in doubt, adapt the match expression (<xsl:template match="text()">) to something more specific (like <xsl:template match="message//text()"> or <xsl:template match="status/text()">) to affect only those text nodes that you really want to straighten out.
Of course you can achieve the same effect by applying a regular expression to the offending string value after you extracted it from the document:
return Regex.Replace(value, #"\s+", " ").Trim();
Using XSLT to clean up the input XML up-front in one step might be more convenient.

String.Format("Your query {0} with {1} Placeholders", theQuery, results.Count) Equivalent in XSLT

Is there an equivalent function to string format in XSLT?
I'm working on a multi-lingual site in umbraco. I'm not aware of what languages will be needed, butbeing as they are, one language could order the words differently e.g.
English "Your query 'Duncan' matched 5 results." could translate word for word to
"5 results matched 'Duncan' query".
For this reason having a item for "Your query", "matched" and "results" in my umbraco translation isn't feasible. If I was to make this a user control for C# I would have the translator to provide a dictionary item like "Your query '{0}' matched {1} results".
Is there an equivalent function to
string format in XSLT?
This is a close analog in XSLT:
The dictionary entry has the following format:
<t>Your query "<query/>" matched <nResults/> results</t>
The transformation (corresponding to string.format()) is very simple:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:param name="pQuery" select="'XPath and XSLT'"/>
<xsl:param name="pNumResults" select="3"/>
<xsl:template match="query">
<xsl:value-of select="$pQuery"/>
</xsl:template>
<xsl:template match="nResults">
<xsl:value-of select="$pNumResults"/>
</xsl:template>
</xsl:stylesheet>
and it produces the wanted, correct result:
Your query "XPath and XSLT" matched 3 results
You could extend your XSLT with a custom function:
http://our.umbraco.org/wiki/reference/xslt/extend-your-xslt-with-custom-functions.

C# consolidate namespace references in xml

I have xml formatted by the atom formatter.
The atom formatter seems to specify namespaces inline multiple times.
Is there any way to easily consolidate these.
The example below shows namespaces specified three times for each property.
This is horrible.
I would like prefixes at the top of the document and no namespaces in the document (just prefixes). Is there a writer or formatter option to achieve this?
<property p3:name="firstname" xmlns:p3="http://a9.com/-/opensearch/extensions/property/1.0/" xmlns="http://a9.com/-/opensearch/extensions/property/1.0/">Drikie</property>
Thanks
Craig.
The atom formatter seems to specify
namespaces inline multiple times.
Is there any way to easily consolidate
these. The example below shows
namespaces specified three times for
each property. This is horrible.
The easiest way to produce this more compact format is to apply the following XSLT transformation on your XML document:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()[not(self::*)]|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}" namespace="{namespace-uri()}">
<xsl:copy-of select="descendant::*/namespace::*"/>
<xsl:copy-of select="namespace::*"/>
<xsl:apply-templates select="node()|#*"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
For example, when applied on the following XML document (based on your question):
<t xmlns="http://a9.com/-/opensearch/extensions/property/1.0/">
<property p3:name="firstname"
xmlns:p3="http://a9.com/-/opensearch/extensions/property/1.0/"
xmlns="http://a9.com/-/opensearch/extensions/property/1.0/"
>Drikie</property>
</t>
the wanted result is produced:
<t
xmlns="http://a9.com/-/opensearch/extensions/property/1.0/"
xmlns:p3="http://a9.com/-/opensearch/extensions/property/1.0/">
<property p3:name="firstname">Drikie</property>
</t>
Do note:
A namespace declaration cannot be promoted further above an element that has a declaration that binds the same prefix to another namespace.
Promoting a namespace declaration to an ancestor element may increase the size of the parsed XML document, because all namespace nodes are propagated down to all descendent nodes, some of which may not need at all that namespace.

How determine the right xml to write out

<?xml version="1.0" encoding="UTF-8"?>
<idmef:IDMEF-Message version="1.0" xmlns:idmef="http://iana.org/idmef">
<idmef:Alert messageid="abc123456789">
<idmef:Analyzer analyzerid="bc-corr-01">
<idmef:Node category="dns">
<idmef:name>correlator01.example.com</idmef:name>
</idmef:Node>
</idmef:Analyzer>
<idmef:CreateTime ntpstamp="0xbc72423b.0x00000000">2000-03-09T15:31:07Z
</idmef:CreateTime>
<idmef:Source ident="a1">
<idmef:Node ident="a1-1">
<idmef:Address ident="a1-2" category="ipv4-addr">
<idmef:address>192.0.2.200</idmef:address>
</idmef:Address>
</idmef:Node>
</idmef:Source>
<idmef:Target ident="a2">
<idmef:Node ident="a2-1" category="dns">
<idmef:name>www.example.com</idmef:name>
<idmef:Address ident="a2-2" category="ipv4-addr">
<idmef:address>192.0.2.50</idmef:address>
</idmef:Address>
</idmef:Node>
<idmef:Service ident="a2-3">
<idmef:portlist>5
</idmef:portlist>
</idmef:Service>
</idmef:Target>
<idmef:Classification text="Login Authentication">
<idmef:Reference origin="vendor-specific">
<idmef:name>portscan</idmef:name>
<idmef:url>http://www.vendor.com/portscan</idmef:url>
</idmef:Reference>
</idmef:Classification>
<idmef:Assessment>
<idmef:Impact severity ="high" completion ="failed" type ="file" >
</idmef:Impact>
</idmef:Assessment>
</idmef:Alert>
</idmef:IDMEF-Message>
I'm working with a xml messaging system, where a message packet is read from a queue, and applied against a rule with a pattern in it. If the pattern matches, the rule fires and some elements, node etc of the xml are read and stored. The definition of what to be read from the message is defined using Xpath expression. For example, the following xpath takes the severity attribute and store it.
name.set(".//idmef:Classification/idmef:Assesment/idmef:Impact/#severity","high");
So, I would take that xpath, compile it, and read the serverity attribute and store for latter use.
When I go to create the new XML message using the stored value, there may be a case that the completion and type attribute are mandatory.
So question is, how do I check if those attributes need to be written out. I know that schema is involved somehow, but how do you do it. More to the point, if the user selects only the severity attribute, how would I go about, adding in the rest of the structure, like Classification, Message and other elements, when have additional xpath lookups, for example down at
Bob.
The commenters are correct - you need to first fix your XML to make it well formed.
However, If I understand your problem correctly, you need write out some XML, adding or changing some attributes.
If this is what you need I would try using an XSL transform to add the attributes.
Here is a modified version of the identity transform that should be close to what you need.
if you need some conditional logic then surround the attribute tags with xsl:if
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:idmef="http://iana.org/idmef" xpath-default-namespace="http://iana.org/idmef">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Impact">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:attribute name="severity">high</xsl:attribute>
<xsl:attribute name="completion">failed</xsl:attribute>
<xsl:attribute name="type">file</xsl:attribute>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You could:
Open original XML (A)
Create a new XML document (B)
Run your xpath against (A)
Add matching results to (B)
Save (B)
This makes any sense?
I found an answer here on stackoverflow, and here it is. Create XML Nodes from XPath I know it is as far away from how I described it above, but at the time I was designing it, I
didn't have a scobie how it would work.

Transforming flat file to XML using XSLT-like technology

I'm designing a system which is receiving data from a number of partners in the form of CSV files. The files may differ in the number and ordering of columns. For the most part, I will want to choose a subset of the columns, maybe reorder them, and hand them off to a parser. I would obviously prefer to be able to transform the incoming data into some canonical format so as to make the parser as simple as possible.
Ideally, I would like to be able to generate a transformation for each incoming data format using some graphical tool and store the transformation as a document in a database or on disk. Upon receival of data, I would apply the correct transformation (never mind how I determine the correct transformation) to get an XML document in a canonical format. If the incoming files had contained XML I would just have created an XSLT document for each format and been on my way.
I've used BizTalk's Flat File XSLT Extensions (or whatever they are called) for something similar in the past, but I don't want the hassle of BizTalk (and I can't afford it either) on this project.
Does anyone know if there are alternative technologies and/or XSLT extensions which would enable me to achieve my goal in an elegant way?
I'm developing my app in C# on .NET 3.5 SP1 (thus would prefer technologies supported by .NET).
XSLT provides new features that make it easier to parse non-XML files.
Andrew Welch posted an XSLT 2.0 example that converts CSV into XML
I think you need something like this (sorry, not supported by .NET but code is very simple)
http://csv2xml.sourceforge.net
IIRC someone has created a "LINQ to CSV" library that might be a starting point to create the intermediate XML (in memory) as input into the transform.
Found it here.
You might try LINQ to CSV. There is one offering from Microsoft's Eric White and another from Matt Perdeck. Others are out there...
I have found 2 potential solutions when looking into a similar problem space.
Progress Software has a set of tools and API (.Net), which when used in conjuction with .conv (flat to XML converter) files created in their Stylus Studio tool allows for transformation of any pre-defined flat file format into XML at run time. More info here: http://www.datadirect.com/developer/data-integration/tutorials/converter-sample-code/index.ssp
In addition there is an XML format called XFLAT which allows for the description of flat files in a variety of formats, delimited, fixed width etc... There is a java program which will convert flat files, where you've provied the XFLAT description into XML so that you can continue with a standard XML to XML XSLT transformation. More details can be found here: http://www.unidex.com/overview.htm
I have never actually used either of these tools, but found them when researching a similar problem.
Check out this article on implementing an XmlReader that processes non-XML input. It's not a terrifically difficult task, and once you've got it working you don't need to use an XSLT-like technology, you can just use XSLT.
this will parse the output from the linux ip route list command. It's just what I had laying around.
you must wrap the output from the comman in an element called 'output' and the style sheet will take it from there. The real key here is the tokenize command in the xpath 2.0 spec. I don't know how you could do this before that. Also this doesn't make a single root element, as that was not what I needed it for. In your case, instead spliting on space, Id spli on a ','
<?xml version="1.0" encoding="UTF-8"?>
<xsl:output method="xml" indent="yes" />
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="//output">
<!-- split things up for each new line -->
<xsl:variable name="line" select="tokenize(.,'\n')"/>
<xsl:for-each select="$line">
<!-- split each line into peices based on space -->
<xsl:variable name="split" select="tokenize(.,' +')"/>
<xsl:if test="count($split) > 1">
<xsl:element name="route">
<xsl:for-each select="$split">
<xsl:choose>
<xsl:when test="position() = 1">
<xsl:attribute name="address" select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="index" select="position()"/>
<xsl:variable name="fieldName" select="."/>
<xsl:if test="$fieldName and position() mod 2 = 0">
<xsl:attribute name="{$fieldName}" select="$split[$index + 1]"/>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:element>
</xsl:if>
</xsl:for-each>
</xsl:template>
You can also take a look at altova's MapForce

Categories