DocumentDb cannot handle hyphen (-) in column names

DocumentDb cannot handle hyphen (-) in column names - c#

I am saving this following XML to DocumentDB:
<DocumentDbTest_Countries>
<country>C25103657983</country>
<language>C25103657983</language>
<countryCode>383388823</countryCode>
<version>2015-08-25T08:36:59:982.3552</version>
<integrity>
<hash-algorithm>sha1</hash-algorithm>
<hash />
</integrity>
<context-info>
<created-by>unittestuser</created-by>
<created-on>2015/08/25 08:36:59</created-on>
<created-time-zone>UTC</created-time-zone>
<modified-by>unittestuser</modified-by>
<modified-on>2015/08/25 08:36:59</modified-on>
<modified-time-zone>UTC</modified-time-zone>
</context-info>
</DocumentDbTest_Countries>
Which gets saved fine to the DocumentDB as following:
{
"DocumentDbTest_Countries": {
"integrity": {
"hash-algorithm": "sha1",
"hash": ""
},
"context-info": {
"created-by": "unittestuser",
"created-on": "2015/08/25 08:36:59",
"created-time-zone": "UTC",
"modified-by": "unittestuser",
"modified-on": "2015/08/25 08:36:59",
"modified-time-zone": "UTC"
},
"country": "C25103657983",
"language": "C25103657983",
"countryCode": 383388823,
"version": "2015-08-25T08:36:59:982.3552"
},
"id": "f917945d-eaee-4eff-944d-dae366de7be1"
}
As you can see the column name is indeed saved with hyphen (-) in it in the DocumentDB (without any kind of errors/exceptions/warning apparently) but then when I try to do a lookup it fails in the Query Explorer. It seems there is no way to search on hyphenated column names. Is this true? or, am I missing something? Can someone please point me to a documentation about this limitation somewhere??

For field names that use certain characters (space, "#", "-", etc.) or which conflict with SQL keywords, you have to use quoted property accessor syntax. So instead of writing:
SELECT * FROM c WHERE c.context-info.created-by = "unittestuser"
write:
SELECT * FROM c WHERE c["context-info"]["created-by"] = "unittestuser"

You can also access properties using the quoted property operator []. For example, SELECT c.grade and SELECT c["grade"] are equivalent. This syntax is useful when you need to escape a property that contains spaces, special characters, or happens to share the same name as a SQL keyword or reserved word.
- is one of those special characters, so to access a property which contains -, you need to use the quoted property operator. It is documented :)
Of course, the idiomatic way would be to use camel casing instead of hyphens, but if you don't want to change your structures, you'll need to use the quoted properties.
For example, using your test data, this query works:
SELECT c["country-code"] FROM root.DocumentDbTest_Countries c
EDIT:
The syntax of the query is a bit confusing, which is what led to most of your problems. Contrary to what you might think,
select * from DocumentDbTest_Countries
doesn't in fact mean "get me all the data in DocumentDbTest_Countries". Instead, it seems to mean "get me all the data in the current collection, and alias it as DocumentDbTest_Countries". This is obvious when you look at the data returned - you'd expect it to return only the fields inside of DocumentDbTest_Countries, but it actually returns all of the values, including the id (which is not a part of DocumentDbTest_Countries - should have been obvious earlier :D).
I don't understand why it's designed as this (even using DocumentDbTest_Countries c to explicitly specify an alias doesn't select DocumentDbTest_Countries), but the fix is to actually start the identifier with the collection name. root is just a way to refer to "this collection", so
select * from root.DocumentDbTest_Countries
returns what you'd expect from the original query. Unless you figure out why the original query behaves the way it does, I'd stick with explicitly using root (or a collection name) as the root every time. It seems to me that using from whatever will always return the current collection, unless you have a collection named whatever - a weird design decision, if you ask me. This means that unless you have a collection named lotsOfFun, the following works the same as using root:
select * from lotsOfFun.DocumentDbTest_Countries
Maybe it's because the top-level object is not named, so they decided that whatever name will work just as well, but that's just an idea.

Well the trick was to use CollectionName.DocumentName instead of just the DocumentName, like this (thanks to #Laan for pointing me in that direction) :):
SELECT * FROM TestProject.DocumentDbTest_Countries c where c["#country"] = "C26092630539"
But then I still miss the Document.Id and Document.SelfLink data in the return Document data.

Related

Fastest way to quote one side of an SQL comparison

I am generating SQL code for different types of databases. To do that dynamically, certain parameters of the SQL script are stored in variables.
One such stored parameter is the comparison expression for certain queries.
Lets say I have a Dogs table with a Name, DateOfBirth and Gender columns, then I have comparison expressions in a variable such as:
string myExpression = "Gender=1";
string myExpression2 = "Gender=1 AND Name='Bucky'";
I would build the following SQL string then:
string mySqlString = "SELECT * FROM "dbo"."Dogs" WHERE " + myExpression;
The problem is, that for Oracle syntax, I have to quote the column names (as seen at dbo.Dogs above). So I need to create a string from the stored expression which looks like:
string quotedExpression = "\"Gender\"=1";
Is there a fast way, to do this? I was thinking of splitting the string at the comparison symbol, but then I would cut the symbol itself, and it wouldn't work on complex conditions either. I could iterate through the whole string, but that would include lot of conditions to check (the comparison symbol can be more than one character (<>) or a keyword (ANY,ALL,etc.)), and I rather avoid lots of loops.

IMO the problem here is the attempt to use myExpression / myExpression2 as naked SQL strings. In addition to being a massive SQL-injection hole, it causes problems like you're seeing now. When I need to do this, I treat the filter expression as a DSL, which I then parse into an AST (using something like a modified shunting yard algorithm - although there are other ways to do it). So I end up with
AND
=
Gender
1
=
Name
'Bucky'
Now I can walk that tree (visitor pattern), looking at each. 1 looks like an integer (int.TryParse etc), so we can add a parameter with that value. 'Bucky' looks like a string literal (via the quotes), so we can add a string-based parameter with the value Bucky (no quotes in the actual value). The other two are non-quoted strings, so they are column names. We check them against our model (white-list), and apply any necessary SQL syntax such as escaping - and perhaps aliasing (it might be Name in the DSL, but XX_Name2_ChangeMe in the database). If the column isn't found in the model: reject it. If you can't understand an expression completely: reject it.
Yes, this is more complex, but it will keep you safe and sane.
There may be libraries that can already do the expression parsing (to AST) for you.

Alternative for regex.unescape in C# for quotes (")

So i am facing a problem here which i am sure has a simple answer but i cannot seem to find it.
I am comparing string data from 2 tables using C# code
When the data is null or empty in both tables, i want the comparison to return "True" which basically means they are identical.
I am using string.IsNullorEmpty for checking null or empty conditions.
The problem is in one table, the string value is "" while the other table has the same value escaped and is appearing as "\"\""
I assumed using regex.unescape will solve this but it does not seem to be working and i am getting an output that both the values are different causing problems.
One solution i figured out is directly checking if str == "\"\"" for solving the problem.
But are there any cleaner options?

I think you are mixing things here.
If your strings come from the same data source, then either all of them are escaped, or they are not (and if that's not the case, you have bigger problems than what you are stating).
So, if they are not escaped, and one of them contains "", and the other one contains \"\", then they are not equal, one is 2 characters in length, and the other one is 4.
So I'm assuming that they are escaped and your first string is actually empty in the database (it doesn't contain any characters), and the second one is \"\".
You can then use Regex.Unescape (if they are always escaped), but those two strings are not the same: one is empty, and the other one contains (once unescaped), "", so the first string contains no characters and the second one has two of them: no wonder they won't be compared equal.
Now, iff they are indeed escaped, it does not make sense that one contains "", because those characters should be escaped. And if this is not the case, then you have a very specific problem which is not what you asked for: you need to determine whether your string comes escaped or not from the data source... and that's basically impossible unless there's a very specific set of rules which determine so.
If the data source contains randomly escaped or not strings, imagine your data source returns a string \"\": how do you determine if the actual content is escaped and it means {'"','"'} (2 characters, each of them being a double quote), or if it isn't, and it's 4 characters, representing {'\','"','\','"'} (one backslash, one double-quote, one backslash and one double-quote)? There's just no way to tell unless you have a specification that determines those rules (or another field saying if the string is escaped or not).
So, back to your question: although you haven't put any code, my guess is that it is just not wrong: either your expectatives are what are wrong (you want \"\" to mean a string is empty, but it doesn't, because it just doesn't mean that), or your data is wrong.
Either way, there's no generic code solution to any of those... there's specific code solutions for specific cases (like the one you are showing), but not a generic one: with the info you gave in your question, it's just impossible
After all this babbling, now for a specific answer, if your table A contains unescaped strings, and your table B contains escaped strings:
stringFromTableA == Regex.Unescape(stringFromTableB)
Should return true if stringFromTableA contains "" and stringFromTableB contains \"\". Check it. Neither of those will be empty, so string.IsNullOrEmpty() will return false
And an update: should you be checking those string values in the Visual Studio debugger, the debugger shows them escaped, so if you are seing "" in one and \"\" in the other, then your first string is empty (and string.IsNullOrEmpty will return true), and your second string contains two double quotes: string.IsNullOrEmpty will return false, since it is not actually null or empty. And Regex.Unescape will do nothing on this case, since your string doesn't contain any \ and there's nothing to escape, it's just the debugger showing those \'s.

Force mongo to store values in lowercase

Currently I'm calling .toLower() before inserting into a collection:
site.Name = site.Name.ToLower();
collection.Insert(site);
I see an article(How to force mongo to store members in lowercase?) that forces member names to be lowercase, but can't find info on forcing the values to be lowercase.

Doing so, be it manually or in an automated manner, seems rather dangerous, because it's a lossy operation. There are cases when transforming data on insert is appropriate, e.g. when normalizing strings to be better searchable, but generally speaking, I'd say it's a bad idea. Transform data on read from the client or on read from the database.
An alternative is to add a computed field:
public string NormalizedName {
get { return Name.ToLowerInvariant(); }
set { } // also hacky,
}
Especially in string searches this can also be used to remove or replace problematic UTF-8 characters.

How to validate a excel expression programmatically

I have been developing an application that one of it's responsability is provide to user an page that it's possible to write math expression in EXCEL WAY.
It is an application in ASP.NET MVC, and it's use the SpreadSheetGear library to EXECUTE excel expression.
As it's show below, The page has an texarea to write expression and two button on the right. The green one is for VALIDATE THE EXPRESSION and the red one is for clean textarea.
A,B,C are parameter, that application will replace for values. Notice that it is not possible to know the parameter data type. I mean, if I write a concatenate function, It is necessary that user use double quotes (") to delimitate string. For example
CONCATENATE("A","B") thus, is necessary that user KNOW functions parameters and its correlate data types.
My main issue is how to validate the expression?
SpreadSheetGear there isn't any method to perform this validation.
The method spreadsheetgear provides to perform an formula is:
string formula = "{formula from textarea}"
worksheet.EvaluateValue(formula)
and it's expect and string.
As I don't know what formula user will write, or how many parameters this formula has and what are the parameters data type, it's kind difficult to validate.
Now my question is?
How could I validate the expression?
I think in an alternative: Provide to user and page with textbox for each parameter in the expression. The user will be able to provide some data and validate the RESULT of formula. If the sintaxe of formula is wrong the application throw an exception.
It would be a good solution, but for each "PROCESS" that user will interact to, He'll write 10, 15 formulas, and maybe it would be little painful.
Anyone could provide me an Good solution for that?
Thank you!

https://sites.google.com/site/akshatsharma80/home/data-structures/validate-an-arithmetic-expression
refer this site for validation

This is a very late response but I have been working on expression evaluators in Excel with VBA for a while and I might shed some light. I have three solutions to try but all have limitations.
1) Ask the user to put a '$' after a variable name to signify a string (or some other unique character). Drawback is that it is not as simple as typing a single letter for a variable.
2) Assume all variables entered are double precision. Then change the variable to strings in all combinations until one combination works. Could be very time consuming to try all the combinations if the user enters lots of individual variables.
3) Assume all variables entered are double precision. But then have a list in your program of functions that require strings for parameters. Then you could parse the expression, lookup the functions in your list and then designate the parameters that require string input with a string signifier (like in step 1). This will not account for user defined functions.
Now to test out the function, replace all the numeric variables with '1' and all the string variables with "a", then EvaluateValue. If you get a result or an error signifying a calculation error, it is good.
BTW, in order to parse the expression, I suggest the following method. I do not know C#, only VB, so I will only talk in general terms.
1) Take your expression string and do a search and replace of all the typical operators with the same operator but with a backslash ("\") in front and behind the operator (you can use any other character that is not normally used in Excel formulas if you like). This will delineate these operators so that you can easily ignore them and split up your expression into chunks. Typically only need to delineate +,-,/,*,^,<,>,= and {comma}. So search for a "+" and replace it with a "\+\" and so on. For parenthesis, replace "(" and ")" with "(\\" and "\\)" respectively.
So your sample formula "SUM(A, SQRT(B, C)) * PI()" will look like this:
"SUM(\\A\,\ SQRT(\\B\,\ C\\)\\) \*\ PI(\\\\)"
You can also clean up the string a bit more by eliminating any spaces and by eliminating redundant backslashes by replacing every three consecutive backslashes with a single one (replace "\\" with "\").
2) In Visual Basic there is a command called 'Split' that can take a string like this and split it into a one dimensional array using a delimiter (in this case, the backslash). There must be an equivalent in C# or you can just make one. Your array will look like this: "SUM(", "", "A", ",", "SQRT(", "", "B", etc.
Now iterate through your array, starting at the first element and then skipping every other element. These elements will either be a number (a numeric test), a variable, a function (with have a "(" at the end of it), a parenthesis or blank.
Now you can do other checks as you need and replace the variables with actual values.
3) When you are done, rejoin the array back into a string, without any delimiters, and try the Evaluate function.

validating user input tags

I know this question might sound a little cheesy but this is the first time I am implementing a "tagging" feature to one of my project sites and I want to make sure I do everything right.
Right now, I am using the very same tagging system as in SO.. space seperated, dash(-) combined multiple words. so when I am validating a user-input tag field I am checking for
Empty string (cannot be empty)
Make sure the string doesnt contain particular letters (suggestions are welcommed here..)
At least one word
if there is a space (there are more than one words) split the string
for each splitted, insert into db
I am missing something here? or is this roughly ok?

Split the string at " ", iterate over the parts, make sure that they comply with your expectations. If they do, put them into the DB.
For example, you can use this regex to check the individual parts:
^[-\w]{2,25}$
This would limit allowed input to consecutive strings of alphanumerics (and "_", which is part of "\w" as well as "-" because you asked for it) 2..25 characters long. This essentially removes any code injection threat you might be facing.
EDIT: In place of the "\w", you are free to take any more closely defined range of characters, I chose it for simplicity only.

I've never implemented a tagging system, but am likely to do so soon for a project I'm working on. I'm primarily a database guy and it occurs to me that for performance reasons it may be best to relate your tagged entities with the tag keywords via a resolution table. So, for instance, with example tables such as:
TechQuestion
TechQuestionID (pk)
SubjectLine
QuestionBody
TechQuestionTag
TechQuestionID (pk)
TagID (pk)
Active (indexed)
Tag
TagID (pk)
TagText (indexed)
... you'd only add new Tag table entries when never-before-used tags were used. You'd re-associate previously provided tags via the TechQuestionTag table entry. And your query to pull TechQuestions related to a given tag would look like:
SELECT
q.TechQuestionID,
q.SubjectLine,
q.QuestionBody
FROM
Tag t INNER JOIN TechQuestionTag qt
ON t.TagID = qt.TagID AND qt.Active = 1
INNER JOIN TechQuestion q
ON qt.TechQuestionID = q.TechQuestionID
WHERE
t.TagText = #tagText
... or what have you. I don't know, perhaps this was obvious to everyone already, but I thought I'd put it out there... because I don't believe the alternative (redundant, indexed, text-tag entries) wouldn't query as efficiently.

Be sure your algorithm can handle leading/trailing/extra spaces with no trouble = )
Also worth thinking about might be a tag blacklist for inappropriate tags (profanity for example).

I hope you're doing the usual protection against injection attacks - maybe that's included under #2.
At the very least, you're going to want to escape quote characters and make embedded HTML harmless - in PHP, functions like addslashes and htmlentities can help you with that. Given that it's for a tagging system, my guess is you'll only want to allow alphanumeric characters. I'm not sure what the best way to accomplish that is, maybe using regular expressions.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.