I'm completely new in the GraphDatabase world and also in writing Cypher Statements.
I have a project, where I want to store wiring diagram information inside a graph database (Neo4J). There are different types of Nodes like f.e. WiringDiagram [WD] Node (will be my start node in many cases) and regarding this WD Node all components like fuseboxes, ICU's or sensors are linked via relationships. Plugs also can contain Pins, and Pins are connected via Connectionlines.
The first version is already stored in the Neo4j Database, have a look at the following image.
Now I have a question which way is best to post-process this data. I want the data extracted for one specific wiring Diagram.
So if I would say I want all information about WiringDiagram with ID 123, I should get all components, Pins and Connectenlines which are there. How should the Cypher look like here?
I want the data best in C# Data models (if possible). Because afterwards I want to try to generate an SVG out of the data.
As you can see in the image, the cypher statement looks currently like this. "MATCH (w:WiringDiagram)<-[r:partOf]-(n)-[*2..]-(l) RETURN * LIMIT 50" But with this statement I get strange results in my C# Project...
I would be happy about any help. I'm also open to go forward with another programming language if it fits better for this approach. Happy to hear any suggestions
I found an APOC function which currently does what I want to.
Get All nodes after the searched one, and give the complete subgroup with relationships back.
Looks currently good.
Any suggestion how to store this data back in data models in C#? (whats best way?)
var result = tx.Run($#"MATCH (p:WiringDiagram {{wiringid:1}})
CALL apoc.path.subgraphAll(p, {{
relationshipFilter:
""partOf|has_pin|connectedWith"",
filterStartNode:false,
minLevel: 0,
maxLevel: 10
}})
YIELD nodes, relationships
RETURN nodes, relationships;");
And after that I've got two lists, one with all the nodes, and one with all relationships with start/endnode id
Related
I'm currently testing various reporting tools and came across to List & Label from Combit, which makes a very solid impression. As part of a PoC, I first integrated it into a simple C# Winforms application and attached a SQL Server database.
...
using(ListLabel myLL = new ListLabel())
{
SqlConnection connection = new SqlConnection(Properties.Settings.Default.ConnectionString);
myLL.DataSource = new SqlConnectionDataProvider(connection);
myLL.Design();
}
...
However, I don't understand why with the data structure in the designer I can't then also see the relational structure in/from the database.... all relations seem to be just ignored. I can only see all tables at the root level - even though they are a relation in the SQL Server database.
Couldn't find any information for this so far unfortunately - ideas?
Actually, it's all there usually. You should be able to see the full structure, as soon as you add a report container:
This way, you can add e.g. "Order Details" either as a sub element of the customers (like shown in the screenshot) or as a top level element (if I had selected it from the root of the list).
The field tree on the other hand just shows the available tables with their contents ("fields"). As each table may appear in different levels of the hierarchy, it's only added once to the tree, while the hierarchy is defined when adding a new table.
One exception from this rule is the 1:1 related identifiers of parent tables. As it might well be required to print e.g. Customer data related to an order line in a "Orders" table, you cann access them directly from the field tree:
Thus, you actually do see the relations there, albeit in reverse order. While this might seem confusing at first it really makes sense once you get your head around the concept.
I want to use ML.Net Multi-class classification in my current project that collects error logs from one my company systems.
Point is to add tags to errors and one point in the future train a model to predict and assign tags to incoming logs.
I'm using a model builder and I can't see my table relations, I store all logs in one table, tags in another and all relations in the third one.
|Logs| <-- |LogId|TagId| --> |Tags|
My goal is to classify table with TagId column based on Logs table - is that possible? or do I have to have everything in one table?
Generally speaking, machine learning algorithms are dealing with the fully 'denormalized' and 'prepared' data: every training example is vector of floats ('features'), and one 'ground truth' value.
ML.NET helps with some of the typical pre-processing tasks, like text featurization, one-hot encoding, rescaling/normalization, but it doesn't provide pretty much any 'relational' functionality (no JOINs).
So, you should de-normalize / 'flatten' your data before you pass it to ML.NET.
we recently had a migration project that went badly wrong and we now have 1000's of duplicate records. The business has been working with them which has made the issue worse as we now have records that have the same name and address but could have different contact information. A small number are exact duplicates. we have started the panful process of manually merging the records but this is very slow. Can anyone suggest another way of tackling the problem please?
You can write a console app quickly to merge them & refer the MSDN sample code for the same.
Sample: Merge two records
// Create the target for the request.
EntityReference target = new EntityReference();
// Id is the GUID of the account that is being merged into.
// LogicalName is the type of the entity being merged to, as a string
target.Id = _account1Id;
target.LogicalName = Account.EntityLogicalName;
// Create the request.
MergeRequest merge = new MergeRequest();
// SubordinateId is the GUID of the account merging.
merge.SubordinateId = _account2Id;
merge.Target = target;
merge.PerformParentingChecks = false;
// Execute the request.
MergeResponse merged = (MergeResponse)_serviceProxy.Execute(merge);
When merging two records, you specify one record as the master record, and Microsoft Dynamics CRM treats the other record as the child record or subordinate record. It will deactivate the child record and copies all of the related records (such as activities, contacts, addresses, cases, notes, and opportunities) to the master record.
Read more
Building on #Arun Vinoth's answer, you might want to see what you can leverage with out-of-box duplicate detection to get sets of duplicates to apply the merge automation to.
Alternatively you can build your own dupe detection to match records on the various fields where you know dupes exist. I've done similar things to compare records across systems, including creating match codes to mimic how Microsoft does their dupe detection in CRM.
For example, a contact's match codes might be
1. the email address
2. the first name, last name, and company concatenated together without spaces.
If you need to match Companies, you can implement the an algorithm like Scribe's stripcompany to generate matchcodes based on company names.
Since this seems like a huge problem you may want to consider drastic solutions like deactivating the entire polluted data set and redoing the data import clean, then finding any of the deactivated records that got touched in the interim to merge them, then deleting the entire polluted (deactivated) data set.
Bottom line, all paths seem to lead to major headaches and the only consolation is that you get to choose which path to follow.
I'm a PHP programmer, and I'm trying to understand some code which I think is ASP.NET. This is also my first foray into XML. I don't have access to a Windows box to test on.
I need to produce XML output that third-party code can use. The third party wants to use our data instead of the data source they are currently using. I don't want to replicate the current XML structure exactly because it doesn't map well to our data.
The structure of the current XML is very flat. There are only a few nested elements and the third party doesn't make use of any of them. The third party does have a sub-contracted programmer, but he is very busy. Also, I want to understand, for myself, how this works.
This is an excerpt from a plugin for a custom CMS:
Dim obj_set As New Data.DataSet()
Using obj_reader As New System.Xml.XmlTextReader("http://www.example.com/xml_output.php")
obj_set.ReadXml(obj_reader)
End Using
Dim obj_view As Data.DataView = obj_set.Tables("profile").DefaultView
obj_view.Sort = "cname"
Dim obj_data As Data.DataTable = obj_view.ToTable()
So from what I have gathered so far, this code
reads the XML file into a DataSet
sorts the profile table by cname
creates a new DataTable from the sorted view
There is other code that stores the new table to, and retrieves it from, cache. Then there is code that loops through the table rows and maps the column names to template variables.
Sample excerpt of current XML structure:
<profiles>
<profile>
<cname>ABC Corporation</cname>
<fname>John</fname>
<lname>Smith</lname>
<sector>Widgets</sector>
<subsectors>
<subsector>Basic Widgets</subsector>
<subsector>Fancy Widgets</subsector>
</subsectors>
</profile>
</profiles>
So what happens to the subsectors data? Does the reader create a separate table for it? If so, how are the tables related?
Our data includes multiple contacts per company. I could just create multiple elements at the top level fname1, fname2, fname3 to keep the flat structure. But I was thinking a nested structure makes sense for this kind of data. The problem is that I don't understand if such a structural change is compatible with the plugin code.
What kinds of changes would need to be made to the plugin code to make use of nested elements?
I was stumped on this myself, and I don't know if you still are, but for reference to others here's what I found.
You are right in assuming that the reader creates a separate table for it. Being that a DataSet can hold multiple tables, each "level" of elements gets its own table. However, any nested elements that have nested elements of their own will get their own table. Essentially, it keeps creating tables until it reaches the bottom of the xml tree. If an element has no children, it gets added as a cell in the data table.
In your case,
dataSet.Tables[0] will hold the top level nodes (all the <.profiles>). But since the nested element <.profile> has elements of its own, Tables[0] will likely only have one row. The next level deeper, dataSet.Tables[1] will hold all <-profile> nodes. Although since <.subsectors> has sub-element <.subsector>, it will not be in Tables[1], but rather in Tables[2] which goes yet level deeper.
I know it has been a while since this was asked, but hopefully this will be helpful.
Let me first describe the situation. We host many Alumni events over the course of each year and provide online registration forms for each event. There is a large chunk of data that is common for each event:
An Event with dates, times, managers, internal billing info, etc.
A Registration record with info about the payment and total amount charged per form submission
Bio/Demographic and alumni data about the 1 or more attendees (name, address, degree, etc.)
We store all of the above data within columns in tables as you would expect.
The trouble comes with the 'extra' fields we are asked to put on the forms. Maybe it is a dinner and there is a Veggie or Carnivore option, perhaps there is lodging and there are bed or smoking options, or perhaps there is an optional transportation option. There are tons of weird little "can you add this to the form?" types of requests we receive.
Currently, we JSONify any non-standard data and store it all in one column (per attendee) called 'extras'. We can read this data out in code but it is not well suited to querying. Our internal staff would like to generate a quick report on Veggie dinners needed for instance.
Other than creating a separate table for each form that holds the specific 'extra' data items, are there any other approaches that could make my life (and reporting) easier? Anyone working in a simialr environment?
This is actually one of the toughest problem to solve efficiently. The SQL Server Customer Advisory Team has dedicated a white-paper to the topic which I highly recommend you read: Best Practices for Semantic Data Modeling for Performance and Scalability.
You basically have 3 options:
semantic database (entity-attribute-value)
XML column
sparse columns
Each solution comes with ups and downs. Out of the top of my hat I'd say XML is probably the one that gives you the best balance of power and flexibility, but the optimal solution really depends on lots of factors like data set sizes, frequency at which new attributes are created, the actual process (human operators) that create-populate-use these attributes etc, and not at least your team skill set (some might fare better with an EAV solution, some might fare better with an XML solution). If the attributes are created/managed under a central authority and adding new attributes is a reasonable rare event, then the sparse columns may be a better answer.
Well you could also have the following db structure:
Have a table to store custom attributes
AttributeID
AttributeName
Have a mapping table between events and attributes with:
AttributeID
EventID
AttributeValue
This means you will be able to store custom information per event. And you will be able to reuse your attributes. You can include some metadata as
AttributeType
AllowBlankValue
to the attribute to handle it easily afterwards
Have you considered using XML instead of JSON? Difference: XML is supported (special data type) and has query integration ;)
quick and dirty, but actually nice for querying: simply add new columns. it's not like the empty entries in the previous table should cost a lot.
more databasy solution: you'll have something like an event ID in your table. You can link this to an n:m table connecting events to additional fields. And then store the additional field data in a table with additional_field_id, record_id (from the original table) and the actual value. Probably creates ugly queries, but seems politically correct in terms of database design.
I understand "NoSQL" (not only sql ;) databases like couchdb let you store arbitrary fields per record, but since you're already with SQL Server, I guess that's not an option.
This is the solution that we first proposed in ASP.NET Forums (that later became Community Server), and that the ASP.NET team built a similar version of in the ASP.NET 2.0 Membership when they released it:
Property Bags on your domain objects
For example:
Event.Profile() or in your case, Event.Extras().
Basically, a property bag is a serialized collection of data stored in a name/value pair in a column (or columns). The ASP.NET 2.0 Membership went the route of storing names in a semi-colon delimited list, and values in the same:
Table: aspnet_Profile
Column: PropertyNames (separated by semi-colons, and has start index and end index)
Column: PropertyValues (separated by semi-colons, and only stores the string value)
The downside to that approach is it is all strings, and manually has to be parsed (even though the membership system does it for you automatically).
Recently, my current method is I've built FormCollection and NameValueCollection C# extension methods that automatically serialize the collections to an XML result. And I store that XML in the table in it's own column associated with that entity. I also have a deserializer C# extension on XElement that deserializes that data back to the collection at runtime.
This gives you the power of actually querying those properties in XML, via SQL (though, that can be slow though - always flatten out your read-only data).
The final note is runtime querying: The general rule we follow is, if you are going to query a property of an entity in normal application logic, then you move that property to an actual column on the table - and create the appropriate indexes. If that data will never be queried directly (for example, Linq-to-Sql or EF), then leave it in the XML Property Bag.
Property Bags gives you the power of extending your domain models however you like, without having to modify the db schema.