Find translation unit pairs in tmx file - c#

I have a Translation Memory which is essentially an XML file based on Translation Memory eXchange format specifications and I am trying to find a specific translation unit for editing. This is an example of the structure:
<?xml version="1.0" encoding="utf-8"?>
<tmx version="1.4">
<header creationtool="xxx" .... />
<body>
<tu tuid="1">
<tuv xml:lang="en-US">
<seg>sample source</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>sample translation</seg>
</tuv>
</tu>
<tu tuid="2">
<tuv xml:lang="en-US">
<seg>Address</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>Adresse</seg>
</tuv>
</tu>
.....
</body>
</tmx>
What I want is to be able to find all the translation units (tu) that have a specific source translation and a specific target translation. So for example I want to find all translation units where the xml language attribute value is "en-US" and the seg element value is "sample source" and the xml language attribute value is "de-DE" and its seg element value is "sample translation". I want to find
<tu tuid="18">
<tuv xml:lang="en-AU">
<seg>sample source</seg>
</tuv>
<tuv xml:lang="de-DE">
<seg>sample translation</seg>
</tuv>
</tu>
It is possible as well there is more than one translation unit (tu) that fits the criteria - that is there is possibly duplicates in the translation memory.
I have tried to get a collection I could iterate through e.g.
XElement root = XElement.Load(#"sample.tmx");
IEnumerable<XElement> translationUnits =
from el in root.Elements("tu")
where
(from tuv in el.Elements("tuv")
where
(string)tuv.Attribute(XNamespace.Xml + "lang") == "en-US" &&
(string)tuv.Element("seg") == "sample source"
select tuv)
.Any()
select el;
foreach (XElement el in translationUnits)
Console.WriteLine((string)el.Attribute("tuid"));
However I am obviously doing something wrong however I think I am on the right track. Once I find the collection I then want to update the target translation.

The way I eventually solved this for future reference is using XmlDocument
XmlDocument document = new XmlDocument();
document.Load(this.fileName);
string nodeSelect = "/tmx/body/tu/tuv[lang('" + this.sourceLanguage + "') and seg = '" + this.originalSourceText + "']";
XmlNodeList nodes = document.DocumentElement.SelectNodes(nodeSelect);
foreach (XmlNode node in nodes) {
XmlNode parent = node.ParentNode;
foreach (XmlNode translationNode in parent) {
string searchNode = "*[lang('" + this.targetLanguage + "') and //seg = '" + this.originalTranslationText + "']";
XmlNode test = translationNode.SelectSingleNode(searchNode);
if (test != null) {
if (test.InnerText.Equals(this.originalTranslationText, StringComparison.Ordinal)) {
test.InnerText = this.newTranslation;
}
}
}
}

Related

c# - Get values from XML Nodes with more than one value

First of all I would like to clarify that I'm noob programming.
Here is my question.
I'm having troubles getting the values of a node with more than one value.
I'm using Xml.Linq.
Example of my XML:
<root>
<ManufactureID>test</ManufactureID>
<Part>21034015</Part>
<Fixture>Erowa</Fixture>
<Material>CrCo</Material>
<ImplantIndex IMP="IMP1">
<Position x="26,61927" y="3,666112" z="-13,54083"/>
<Direction x="0,7169617301164524" y="0,41536091911417444" z="-0,5598581824185941"/>
<Xaxis x="0,7169617301164524" y="0,41536091911417444" z="-0,5598581824185941"/>
<Yaxis x="0,4630894965759858" y="0,31652069765969354" z="0,8278663938788802"/>
<Zaxis x="0,52107004875489" y="-0,8528129659108433" z="0,034583948081838636"/>
</ImplantIndex>
<ImplantIndex IMP="IMP2">
<Position x="27,20444" y="3,832021" z="-5,81747"/>
<Direction x="0,5516120001302346" y="0,2908829003330433" z="-0,7817361061164817"/>
<Xaxis x="0,5516120001302346" y="0,2908829003330433" z="-0,7817361061164817"/>
<Yaxis x="0,7202426402494431" y="0,30658331713284814" z="0,6222999347760941"/>
<Zaxis x="0,420683658440441" y="-0,9063077887504092" z="-0,04039123136907434"/>
</ImplantIndex>
</root>
For getting the nodes value of Part,Fixture or Material I have not problem.
But for getting the x/y/z values of position and direction actually I'm using:
string position = doc.Root.Element("ImplantIndex").Element("Position").ToString();
string[] posTokens = position.Split('"');
Console.WriteLine(double.Parse(posTokens[1]));
Console.WriteLine(double.Parse(posTokens[3]));
Console.WriteLine(double.Parse(posTokens[5]));
Anyone can help me getting a better way for doing that last part?
Thank you in advance.
This is how you normally access attribute value at node:
XmlDocument doc = new XmlDocument();
doc.LoadXml("<book genre='novel' ISBN='1-861001-57-5'>" +
"<title>Pride And Prejudice</title>" +
"</book>");
XmlElement root = doc.DocumentElement;
// Check to see if the element has a genre attribute.
if (root.HasAttribute("genre")){
String genre = root.GetAttribute("genre");
Console.WriteLine(genre);
}
or using XElement
XElement root = XElement.Load("PurchaseOrder.xml");
IEnumerable<XElement> address =
from el in root.Elements("Address")
where (string)el.Attribute("Type") == "Billing"
select el;
foreach (XElement el in address)
Console.WriteLine(el);

I am having an xml inside xml and want to test a condition is met inside the inner xml. C# solution required

<?xml version="1.0"?>
<TextType IsKey="false" Name="XMLReport"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Providers
xmlns="Reporting"/>
<Sales
xmlns="Reporting"/>
<Value
xmlns="Reporting">
<?xml version="1.0" encoding="utf-8"?>
<TestReport>
<StudyUid>
<![CDATA[123]]>
</StudyUid>
<Modality>
<![CDATA[XYZ]]>
</Modality>
<StudyDate format="DICOM">123456</StudyDate>
<StudyTime format="DICOM">6789</StudyTime>
<AccessionNumber>
<![CDATA[123]]>
</AccessionNumber>
<StudyDescription>
<![CDATA[abc def]]>
</StudyDescription>
<OperatorName format="xyz">
<![CDATA[abc]]>
</OperatorName>
<PhysicianReadingStudy format="xyz">
<![CDATA[^^^^]]>
</PhysicianReadingStudy>
<InstitutionName>
<![CDATA[xyz]]>
</InstitutionName>
<HospitalName>
<![CDATA[Hospital Name]]>
</HospitalName>
<ReportSet>
<MyReport ID="1">
<ReportStatus>
<![CDATA[Done]]>
</ReportStatus>
</MyReport>
<MyReport ID="2">
<ReportStatus>
<![CDATA[Done]]>
</ReportStatus>
</MyReport>
<MyReport ID="3">
<ReportStatus>
<![CDATA[Initial]]>
</ReportStatus>
</MyReport>
</ReportSet>
<ReportImageSet />
<FetusSet />
</TestReport>
</Value>
<WhoSetMe xmlns="Reporting">NotSpecified
</WhoSetMe>
</TextType>
I want to parse the xml above in C# and check whether "ReportStatus" is "Done" for all the ReportStatus under MyReport/ReportSet. One more twist here is the xml contains one more xml starts at "Value" tag as in above example.It may contatin many ReportStatus tag under ReportSet tag. Can someone please help me?
// Can you try this? I tried to do it with LINQ to XML.
// I assume you have multiple <TestReport /> elements in <Value /> tag
// and var xelement is your xml variable
// First we get all TestReport elemnts
IEnumerable<XElement> allReports =
from el in xelement.Elements("TextType/Value/TestReport")
select el;
// From allReports we get all MyReport elemnts
IEnumerable<XElement> allMyReports =
from el in allReports.Elements("ReportSet/MyReport")
select el;
// From allReports we also get all MyReport elemnts with element ReportStatus value equals "Done"
IEnumerable<XElement> allDoneMyReports =
from el in allMyReports
where (string)el.Element("ReportStatus") == "Done"
select el;
// Now we compare allMyReport with allDoneMyReports
if (allMyReports.Count() == allDoneMyReports.Count())
{
//DO Somehing
}
Your XML document is invalid. You need to fix it before trying to parse it. The issue is that a document can only have one top-level element; you have 2 <TextType> and <Providers>.
Most of your elements are the namespace Reporting. You need to use it when referencing the element.
XNamespace ns = "Reporting";
var value = doc.Element("Value" + ns);
Update
Just use the namespace for each element
XNamespace ns = "Reporting";
var value = xelement.Elements("Value" + ns);
Another Update
The XML document is considered invalid because it has multiple XML declarations; there is no way to disable this. I suggest you pre-process the document to remove the extra declarations. Here's an example (https://dotnetfiddle.net/UnuAF6)
var xml = "<?xml version='1.0'?><a> <?xml version='1.0'?><b id='b' /></a>";
var doc = XDocument.Parse(xml.Replace(" <?xml version='1.0'?", " "));
var bs = doc.Descendants("b");
Console.WriteLine("{0} 'b' elements", bs.Count());

Xml delete node by elements value

Here is xml file:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Data>
<PageInfo>
<ID>0</ID>
<NUM>5</NUM>
<URL>er.php</URL>
</PageInfo>
<PageInfo>
<ID>1</ID>
<NUM> 12345</NUM>
<URL>/out/out.ViewFolder.php</URL>
</PageInfo>
</Data>
I have tried alot of ways (for a week now) to delete certain node (PageInfo) by element (ID,NUM,URL) in this xml file.
There are few approaches I have tried:
1st approach:
XmlDocument docc = new XmlDocument();
docc.LoadXml(AppDomain.CurrentDomain.BaseDirectory + "/WebData.xml");
XmlNode nodee = docc.SelectSingleNode("/Data/PageInfo/ID[2]");
nodee.RemoveAll();
2nd approach:
XmlDocument document = new XmlDocument();
document.Load(AppDomain.CurrentDomain.BaseDirectory + "/WebData.xml");
XmlNodeList nodes = document.DocumentElement.SelectNodes("/Data/PageInfo");
string ID, NUM, URL;
foreach (XmlNode node in nodes)
{
ID = node.SelectSingleNode("ID").InnerText;
NUM = node.SelectSingleNode("NUM").InnerText;
URL = node.SelectSingleNode("URL").InnerText;
node.RemoveAll();
Console.WriteLine(ID + " " + NUM + " " + URL + "\n");
}
1st solution does not trigger and exception but nothing happens, 2nd solution throws an exception: Data at the root level is invalid.
How one would be able to delete nodes by elements value in an xml file? (LINQ is fine)
Disclaimer: all solutions I have found on StackOverflow does not work for my certain case.
Based on the ID, please try this solution :
First approach
string xml = AppDomain.CurrentDomain.BaseDirectory + "/WebData.xml";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(xml);
XmlNode t = xmlDoc.SelectSingleNode("/Data/PageInfo[ID='0']");
t.ParentNode.RemoveChild(t);
xmlDoc.Save(xml);
Second approach : Linq
XDocument xmlDoc = XDocument.Load(xml);
var pageInfo = (from xml2 in xmlDoc.Descendants("PageInfo")
where xml2.Element("ID").Value == "0"
|| xml2.Element("NUM").Value == "5"
|| xml2.Element("URL").Value == "er.php"
select xml2).FirstOrDefault();
pageInfo.Remove();
xmlDoc.Save(xml);
// output
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Data>
<PageInfo>
<ID>1</ID>
<NUM> 12345</NUM>
<URL>/out/out.ViewFolder.php</URL>
</PageInfo>
</Data>
As <Data> is an array so you can deserialize it in a classData which has List<PageInfo> so you can update your data accordingly and then serialize it back in your file.
Example:
XmlArray("Data")]
public class Data
{
[XmlArrayItem("PageInfo")]
public List<PageInfo> pageInfos = new List<PageInfo>();
}
public class PageInfo
{
public int ID;
public int NUM;
public string URL;
}
Now you can apply queries on your list and then deserialize your Data class back to file. See This Link for Serializing guide.

Search through XML and grab another Node

<?xml version="1.0" encoding="UTF-8"?>
<Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Message>
<MessageID>1</MessageID>
<Product>
<SKU>33333-01</SKU>
</Product>
</Message>
</Envelope>
I've tried googling but whether I'm just not providing the correct search criteria I don't know.
I want to be able to search the XML file based on the MessageID and then grab the SKU.
I then want to search another XML file based on the SKU and remove that message completely.
<?xml version="1.0" encoding="UTF-8"?>
<Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Message>
<MessageID>1</MessageID>
<Inventory>
<SKU>33333-01</SKU>
<Quantity>1</Quantity>
</Inventory>
</Message>
<Message>
<MessageID>2</MessageID>
<Inventory>
<SKU>22222-01</SKU>
<Quantity>1</Quantity>
</Inventory>
</Message>
</Envelope>
Meaning the XML above becomes:
<?xml version="1.0" encoding="UTF-8"?>
<Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Message>
<MessageID>2</MessageID>
<Inventory>
<SKU>22222-01</SKU>
<Quantity>1</Quantity>
</Inventory>
</Message>
</Envelope>
To confirm I cannot confirm that the MessageID will be the same over different XML files.
Thanks in advance for any help.
My questions:
How do I search through XML files?
How do I then grab another Nodes details
Can I remove a complete from an XML file based on a search?
You can use XmlDocument to load your XML document. Then, you can use XPath for searching any nodes.
XmlDocument document = new XmlDocument();
document.Load("C:\fileOnTheDisk.xml");
// or
document.LoadXml("<a>someXmlString</a>");
// Returns single element or null if not found
var singleNode = document.SelectSingleNode("Envelope/Message[MessageID = '1']");
// Returns a NodeList
var nodesList = document.SelectNodes("Envelope/Message[MessageID = '1']");
Read more about XPath at w3schools.com.
Here is a good XPath Tester.
For example, you can use the following XPath to find nodes in your document by ID:
XmlDocument document = new XmlDocument();
document.Load("C:\doc.xml");
var node = document.SelectSingleNode("Envelope/Message[MessageID = '1']");
var sku = node.SelectSingleNode("Inventory/SKU").InnerText;
Console.WriteLine("{0} node has SKU = {1}", 1, sku);
Or you can output all SKUs:
foreach (XmlNode node in document.SelectNodes("Envelope/Message"))
{
Console.WriteLine("{0} node has SKU = {1}",
node.SelectSingleNode("MessageID").InnerText,
node.SelectSingleNode("Inventory/SKU").InnerText);
}
It will produce:
1 node has SKU = 33333-01
2 node has SKU = 22222-01
Note that there are possible NullReferenceExceptions if nodes are not present.
You can simply remove it using RemoveChild() method of its parent.
XmlDocument document = new XmlDocument();
document.Load("C:\doc.xml");
var node = document.SelectSingleNode("Envelope/Message[MessageID = '1']");
node.ParentNode.RemoveChild(node);
document.Save("C:\docNew.xml"); // will be without Message 1
You can use Linq to XML to do this:
var doc= XDocument.Load("input.xml");//path of your xml file in which you want to search based on message id.
var searchNode= doc.Descendants("MessageID").FirstOrDefault(d => d.Value == "1");// It will search message node where its value is 1 and get first of it
if(searchNode!=null)
{
var SKU=searchNode.Parent.Descendants("SKU").FirstOrDefault();
if(SKU!=null)
{
var searchDoc=XDocument.Load("search.xml");//path of xml file where you want to search based on SKU value.
var nodes =searchDoc.Descendants("SKU").Where(d=>d.Value==SKU.Value).Select(d=>d.Parent.Parent).ToList();
nodes.ForEach(node=>node.Remove());
searchDoc.Save("output.xml");//path of output file
}
}
I'd recommend you did this using LINQ to XML - it's much nicer to work with than the old XmlDocument API.
For all the examples, you can parse your XML string xml to an XDocument like so:
var doc = XDocument.Parse(xml);
1. How do I search through XML files?
You can get the SKU for a specific message ID by querying your document:
var sku = (string)doc.Descendants("Message")
.Where(e => (int)e.Element("MessageID") == 1)
.SelectMany(e => e.Descendants("SKU"))
.Single();
2. How do I then grab another Nodes details?
You can get the Message element with a specified SKU using a another query:
var message = doc.Descendants("SKU")
.Where(sku => (string)sku == "33333-01")
.SelectMany(e => e.Ancestors("Message"))
.Single();
3. Can I remove a complete element from an XML file based on a search?
Using your result from step 2, you can simple call Remove:
message.Remove();
Alternatively, you can combine the query from step 2 and simply execute a command to remove any messages that have a specific SKU:
doc.Descendants("SKU")
.Where(sku => (string)sku == "33333-01")
.SelectMany(e => e.Ancestors("Message"))
.Remove();
I tried to answer all your questions:
using System.Xml.XPath;
using System.Xml.Linq;
XDocument xdoc1 = XDocument.Load("xml1.xml");
XDocument xdoc2 = XDocument.Load("xml2.xml");
string sku = String.Empty;
string searchedID = "2";
//1.searching through an xml file based on path
foreach (XElement message in xdoc1.XPathSelectElements("Envelope/Message"))
{
if (message.Element("MessageID").Value.Equals(searchedID))
{
//2.grabbing another node's details
sku = message.XPathSelectElement("Inventory/SKU").Value;
}
}
foreach (XElement message in xdoc2.XPathSelectElements("Envelope/Message"))
{
if (message.XPathSelectElement("Inventory/SKU") != null && message.XPathSelectElement("Inventory/SKU").Value.Equals(sku))
{
//removing a node
message.Remove();
}
}
xdoc2.Save("xml2_del.xml");
}

Looping through XML Document

My Method:
if (File.Exists( #"C:\config.xml"))
{
System.Xml.XmlDocument xd = new System.Xml.XmlDocument();
xd.Load( #"C:\config.xml");
System.Xml.XmlElement root = xd.DocumentElement;
System.Xml.XmlNodeList nl = root.SelectNodes("/config");
foreach (System.Xml.XmlNode xnode in nl)
{
string name = xnode.Name;
string value = xnode.InnerText;
string nv = name + "|" + value;
Send(nv);
}
My Xml Doc
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<config>
<bla>D</bla>
<def>300</def>
<ttOUT>34000</ttOUT>
<num>3800</num>
<pw>help</pw>
<err>1</err>
....and so on
</config>
Now my method returns the first 2 and nothing else.
What am i doing wrong...
use the System.Xml namespace to avoid long type qualifications ie...
using System.Xml;
Then try something like this..
XmlNodeList nl = xd.SelectNodes("config");
XmlNode root = nl[0];
foreach (XmlNode xnode in root.ChildNodes)
{
string name = xnode.Name;
string value = xnode.InnerText;
string nv = name + "|" + value;
Send(nv);
}
I believe there is something wrong with your method.
a) I don't think SelectNodes should take the /config argument, rather it should take config.
b) After selecting the first (and only - XML files in .Net must have one and only one root node) root node you need to iterate through the ChildNodes of the root.
root is the <config> tag, so I don't understand how root.SelectNodes("/config") should work at all. Use root.Childnodes instead.

Categories