Query XML to filter down to

Query XML to filter down to - c#

I'm trying to create a list that is the results of an inner join between two xml lists and then filter on an external list of keys that are in a DataTable. A code example is not necessary as long as I can be pointed to something explaining how to do it.
Whut I'm working with is the returned data from Microsoft Cognitive Services Text Analytics Topic Detection. There are two sections in the XML. The first is the list of items I need to get. The other has a list of keys I have references for.
An example of the returned XML is on their Quick Start page. Do a find on page for: >>> "status": "Succeeded" <<< to get to it.
You'll see in the bottom part, the "topicAssignments" section, of the example XML the value "documentId": "1", That's the key I have in the DataTable. The Document also has one-to-many "topicId" associated to it. That's where the relationship is set between the "topics" section and the "topicAssignments" section
Thanks in advance.

As often as not, writing up the question clarified my ideas enough to Google it successfully.
https://www.google.com/search?q=linq+inner+join+on+xml&rlz=1C1CHFX_enUS631US631&oq=linq+inner+join+on+xml
https://blogs.msdn.microsoft.com/wriju/2008/03/24/linq-to-xml-join-xml-data/
http://www.c-sharpcorner.com/blogs/joins-using-linqtoxml1

Related

Get all Nodes and Relationships with Propperties outgoing from one StartNode

I'm completely new in the GraphDatabase world and also in writing Cypher Statements.
I have a project, where I want to store wiring diagram information inside a graph database (Neo4J). There are different types of Nodes like f.e. WiringDiagram [WD] Node (will be my start node in many cases) and regarding this WD Node all components like fuseboxes, ICU's or sensors are linked via relationships. Plugs also can contain Pins, and Pins are connected via Connectionlines.
The first version is already stored in the Neo4j Database, have a look at the following image.
Now I have a question which way is best to post-process this data. I want the data extracted for one specific wiring Diagram.
So if I would say I want all information about WiringDiagram with ID 123, I should get all components, Pins and Connectenlines which are there. How should the Cypher look like here?
I want the data best in C# Data models (if possible). Because afterwards I want to try to generate an SVG out of the data.
As you can see in the image, the cypher statement looks currently like this. "MATCH (w:WiringDiagram)<-[r:partOf]-(n)-[*2..]-(l) RETURN * LIMIT 50" But with this statement I get strange results in my C# Project...
I would be happy about any help. I'm also open to go forward with another programming language if it fits better for this approach. Happy to hear any suggestions

I found an APOC function which currently does what I want to.
Get All nodes after the searched one, and give the complete subgroup with relationships back.
Looks currently good.
Any suggestion how to store this data back in data models in C#? (whats best way?)
var result = tx.Run($#"MATCH (p:WiringDiagram {{wiringid:1}})
CALL apoc.path.subgraphAll(p, {{
relationshipFilter:
""partOf|has_pin|connectedWith"",
filterStartNode:false,
minLevel: 0,
maxLevel: 10
}})
YIELD nodes, relationships
RETURN nodes, relationships;");
And after that I've got two lists, one with all the nodes, and one with all relationships with start/endnode id

Revit : Get All ViewSheet is very slow

for a plugin I need to get all the viewsheet in the rvt file and display informations from them in an xaml dialog
but my process is very very slow the first time I use it
(with the debuger : 500 ms for 83 viewplan , it is very slow without the debuger too)
(if I execute my code again, the execution is istantaneous)
my code bellow
can you help me ?
thanks in advance
Luc
protected IEnumerable<Element> GetAllEl(Document document)
{
var filteredElementCollector = new FilteredElementCollector(document);
filteredElementCollector = filteredElementCollector
.OfCategory(BuiltInCategory.OST_Sheets)
.WhereElementIsNotElementType()
.OfClass(typeof(ViewSheet));
var fcElements = filteredElementCollector.ToElements();
return fcElements;
}

I do not think there is currently a known generic solution for that problem.
Here is a recent discussion with the development team on this:
Question: for a given element id, we need to find the list of sheet ids displaying it.
Current solution: we loop through all the sheets and views and use FilteredElementCollector( doc, sheet.Id)
With the results from that, we perform one more call to FilteredElementCollector( doc, view.Id) and look for the element id.
Issue: the current solution takes a lot of time and displays a Revit progress bar saying Generating graphics.
Is there any better way to know if a given element id is available in the sheet or not?
For example, something like this would be very useful:
getAllSheets(ElementId) // returns array of sheet id
hasGuid(ElementId,sheetId) // return true/false
Does the API provide any such methods, to check whether a given ElementId is available in the sheet?
Answer: So the goal is to find a view that displays a particular element on a sheet?
Many model elements could be visible on multiple views, while most annotation elements are typically present only in one view.
What type of elements are you checking for?
And what will you do with that info?
Response: the goal is to find a view that displays a particular element on a sheet.
It can be any type of element.
Answer: Here are some previous related discussions:
Determining Views Showing an Element
The inverse, Retrieving Elements Visible in View
Response: The problem is that the first call to FilteredElementCollector( doc, viewId ) shows generating graphics in the progress bar.
Only the first time search does so. The second time, search on the same view has no issues with performance.
Answer: The first time is slow because in order to iterate on the elements visible in a view the graphics for that view must be generated.
I can't think of a workaround to get a precise answer.
You might be able to skip sheets which don't have model views in their viewport list to save a bit of time.
Some sheets may only have drafting views and schedules and annotations.
The development team provided a very helpful suggestion which helped work around the generating graphics call in a special case,
to Loop through sheets - generating graphics.
Maybe you can optimise in a similar manner for your specific case?

I think you may be over-filtering the ElementCollector. In my add-in, I just use this code to get the view sheets: new FilteredElementCollector(_doc).OfClass(typeof(ViewSheet));

Duplicate Records within CRM

we recently had a migration project that went badly wrong and we now have 1000's of duplicate records. The business has been working with them which has made the issue worse as we now have records that have the same name and address but could have different contact information. A small number are exact duplicates. we have started the panful process of manually merging the records but this is very slow. Can anyone suggest another way of tackling the problem please?

You can write a console app quickly to merge them & refer the MSDN sample code for the same.
Sample: Merge two records
// Create the target for the request.
EntityReference target = new EntityReference();
// Id is the GUID of the account that is being merged into.
// LogicalName is the type of the entity being merged to, as a string
target.Id = _account1Id;
target.LogicalName = Account.EntityLogicalName;
// Create the request.
MergeRequest merge = new MergeRequest();
// SubordinateId is the GUID of the account merging.
merge.SubordinateId = _account2Id;
merge.Target = target;
merge.PerformParentingChecks = false;
// Execute the request.
MergeResponse merged = (MergeResponse)_serviceProxy.Execute(merge);
When merging two records, you specify one record as the master record, and Microsoft Dynamics CRM treats the other record as the child record or subordinate record. It will deactivate the child record and copies all of the related records (such as activities, contacts, addresses, cases, notes, and opportunities) to the master record.
Read more

Building on #Arun Vinoth's answer, you might want to see what you can leverage with out-of-box duplicate detection to get sets of duplicates to apply the merge automation to.
Alternatively you can build your own dupe detection to match records on the various fields where you know dupes exist. I've done similar things to compare records across systems, including creating match codes to mimic how Microsoft does their dupe detection in CRM.
For example, a contact's match codes might be
1. the email address
2. the first name, last name, and company concatenated together without spaces.
If you need to match Companies, you can implement the an algorithm like Scribe's stripcompany to generate matchcodes based on company names.
Since this seems like a huge problem you may want to consider drastic solutions like deactivating the entire polluted data set and redoing the data import clean, then finding any of the deactivated records that got touched in the interim to merge them, then deleting the entire polluted (deactivated) data set.
Bottom line, all paths seem to lead to major headaches and the only consolation is that you get to choose which path to follow.

Word translator from XML file

I have to make a program which translates a word from a language to another. Example, if I do translate("Hello","FR") the method must return "Bonjour".
The data is contained in a .NET dictionnary which is in a cache memory zone.
At first, I have to write the translations in a XML file but I don't know how to organize it and how to read it.
I'll have one dictionnary by langage, for example, i'll have
EN which contains 3 keys which are "Bonjour" - "Ola" - "Gutentag" with the same value, which is "Hello".
So, when i'll receive ("Bonjour", "EN"), i'll go in the dictionnary EN and return the value of the key Bonjour.
Bur I really don't see how to organize it in a XML at first to be able to set up all this sytem.
Is this a possibility?
<dico>
<en>
<traduction id ="bonjour" name="hello"/>
<traduction id ="hola" name="hello"/>
<traduction id ="dormir" name="to sleep"/>
<traduction id ="geld" name="argent"/>
<traduction id ="por favor" name="please"/>
</en>
<fr>
...
</fr>
Can you help me please?

This looks fine to me.
For you question about how to read such a file you can check this question How to read xml file in C#.
In order to read id and name value, use node.Attributes["id"].Value.

Following on my my comment above, a better model might be to have the dictionary key above the language element, e.g.
<dico>
<lex id="hello">
<en>hello</en>
<fr>bonjour</fr>
...
</lex>
...
</dico>
Although that might not work with the way that you need to query it, particularly when going from another language to English (or the language you use for the key).

I need to parse an HTML formatted country list into SQL inserts. Is there an easier way to do this?

There is about 2000 lines of this, so manually would probably take more work than to figure out a way to do ths programatically. It only needs to work once so I'm not concerned with performance or anything.
<tr><td>Canada (CA)</td><td>Alberta (AB)</td></tr>
<tr><td>Canada (CA)</td><td>British Columbia (BC)</td></tr>
<tr><td>Canada (CA)</td><td>Manitoba (MB)</td></tr>
Basically its formatted like this, and I need to divide it into 4 parts, Country Name, Country Abbreviation, Division Name and Division Abbreviation.
In keeping with my complete lack of efficiency I was planning just to do a string.Replace on the HTML tags after I broke them up and then just finding the index of the opening brackets and grabbing the space delimited strings that are remaining. Then I realized I have no way of keeping track of which is the country and which is the division, as well as figuring out how to group them by country.
So is there a better way to do this? Or better yet, an easier way to populate a database with Country and Provinces/States? I looked around SO and the only readily available databases I can find dont provide the full name of the countries or the provinces/states or use IPs instead of geographic names.

Paste it into a spreadsheet. Some spreadsheets will parse the HTML table for you.
Save it as a .CSV file and process it that way. Or. Add a column to the spreadsheet that says something like the following:
="INSERT INTO COUNTRY(CODE,NAME) VALUES=('" & A1 & "','" & B1 & "');"
Then you have a column of INSERT statements that you can cut, paste and execute.
Edit
Be sure to include the <table> tag when pasting into a spreadsheet.
<table><tr><th>country</th><th>name></th></tr>
<tr><td>Canada (CA)</td><td>Alberta (AB)</td></tr>
<tr><td>Canada (CA)</td><td>British Columbia (BC)</td></tr>
<tr><td>Canada (CA)</td><td>Manitoba (MB)</td></tr>
</table>
Processing a CSV file requires almost no parsing. It's got quotes and commas. Much easier to live with than XML/HTML.

/<tr><td>([^\s]+)\s\(([^\)])\)<\/td><td>([^\s]+)\s\(([^\)])\)<\/td><\/tr>/
Then you should have 4 captures with the 4 pieces of data from any PCRE engine :)
Alternatively, something like http://jacksleight.com/assets/blog/really-shiny/scripts/table-extractor.txt provides more completeness.

Sounds like a problem easily solved by a Regex.

I recently learned that if you open a url from Excel it will try and parse out the table data.

If you are able to see this table in the browser (Internet explorer), you can select the entire table, right click & "Export to Microsoft Excel"
That should help you get data into separate columns, I guess.

do you have to do this programatically? If not, may i suggest just copying and pasting the table (from the browser) onto MS Excel and then clearing all formats? This way tou get a nice table that can then be imported into your database without problem.
just a suggestion... hth

An assembly exists for .Net called System.Xml; you can just reference the assembly and convert your HTML document to a System.Xml.XmlDocument, you can easily pinpoint the HTML node that contains your required data, and use the use the children nodes to add into your data. This requires little string parsing on your part.

Load the HTML data as XElements, use LINQ to grab the values you need, and then INSERT.

Blowing my own trumpet here but my FOSS tool CSVfix will do it with a combination of the read_xml and sql_insert commands.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.