Currently I'm calling .toLower() before inserting into a collection:
site.Name = site.Name.ToLower();
collection.Insert(site);
I see an article(How to force mongo to store members in lowercase?) that forces member names to be lowercase, but can't find info on forcing the values to be lowercase.
Doing so, be it manually or in an automated manner, seems rather dangerous, because it's a lossy operation. There are cases when transforming data on insert is appropriate, e.g. when normalizing strings to be better searchable, but generally speaking, I'd say it's a bad idea. Transform data on read from the client or on read from the database.
An alternative is to add a computed field:
public string NormalizedName {
get { return Name.ToLowerInvariant(); }
set { } // also hacky,
}
Especially in string searches this can also be used to remove or replace problematic UTF-8 characters.
Related
I am generating SQL code for different types of databases. To do that dynamically, certain parameters of the SQL script are stored in variables.
One such stored parameter is the comparison expression for certain queries.
Lets say I have a Dogs table with a Name, DateOfBirth and Gender columns, then I have comparison expressions in a variable such as:
string myExpression = "Gender=1";
string myExpression2 = "Gender=1 AND Name='Bucky'";
I would build the following SQL string then:
string mySqlString = "SELECT * FROM "dbo"."Dogs" WHERE " + myExpression;
The problem is, that for Oracle syntax, I have to quote the column names (as seen at dbo.Dogs above). So I need to create a string from the stored expression which looks like:
string quotedExpression = "\"Gender\"=1";
Is there a fast way, to do this? I was thinking of splitting the string at the comparison symbol, but then I would cut the symbol itself, and it wouldn't work on complex conditions either. I could iterate through the whole string, but that would include lot of conditions to check (the comparison symbol can be more than one character (<>) or a keyword (ANY,ALL,etc.)), and I rather avoid lots of loops.
IMO the problem here is the attempt to use myExpression / myExpression2 as naked SQL strings. In addition to being a massive SQL-injection hole, it causes problems like you're seeing now. When I need to do this, I treat the filter expression as a DSL, which I then parse into an AST (using something like a modified shunting yard algorithm - although there are other ways to do it). So I end up with
AND
=
Gender
1
=
Name
'Bucky'
Now I can walk that tree (visitor pattern), looking at each. 1 looks like an integer (int.TryParse etc), so we can add a parameter with that value. 'Bucky' looks like a string literal (via the quotes), so we can add a string-based parameter with the value Bucky (no quotes in the actual value). The other two are non-quoted strings, so they are column names. We check them against our model (white-list), and apply any necessary SQL syntax such as escaping - and perhaps aliasing (it might be Name in the DSL, but XX_Name2_ChangeMe in the database). If the column isn't found in the model: reject it. If you can't understand an expression completely: reject it.
Yes, this is more complex, but it will keep you safe and sane.
There may be libraries that can already do the expression parsing (to AST) for you.
I am saving this following XML to DocumentDB:
<DocumentDbTest_Countries>
<country>C25103657983</country>
<language>C25103657983</language>
<countryCode>383388823</countryCode>
<version>2015-08-25T08:36:59:982.3552</version>
<integrity>
<hash-algorithm>sha1</hash-algorithm>
<hash />
</integrity>
<context-info>
<created-by>unittestuser</created-by>
<created-on>2015/08/25 08:36:59</created-on>
<created-time-zone>UTC</created-time-zone>
<modified-by>unittestuser</modified-by>
<modified-on>2015/08/25 08:36:59</modified-on>
<modified-time-zone>UTC</modified-time-zone>
</context-info>
</DocumentDbTest_Countries>
Which gets saved fine to the DocumentDB as following:
{
"DocumentDbTest_Countries": {
"integrity": {
"hash-algorithm": "sha1",
"hash": ""
},
"context-info": {
"created-by": "unittestuser",
"created-on": "2015/08/25 08:36:59",
"created-time-zone": "UTC",
"modified-by": "unittestuser",
"modified-on": "2015/08/25 08:36:59",
"modified-time-zone": "UTC"
},
"country": "C25103657983",
"language": "C25103657983",
"countryCode": 383388823,
"version": "2015-08-25T08:36:59:982.3552"
},
"id": "f917945d-eaee-4eff-944d-dae366de7be1"
}
As you can see the column name is indeed saved with hyphen (-) in it in the DocumentDB (without any kind of errors/exceptions/warning apparently) but then when I try to do a lookup it fails in the Query Explorer. It seems there is no way to search on hyphenated column names. Is this true? or, am I missing something? Can someone please point me to a documentation about this limitation somewhere??
For field names that use certain characters (space, "#", "-", etc.) or which conflict with SQL keywords, you have to use quoted property accessor syntax. So instead of writing:
SELECT * FROM c WHERE c.context-info.created-by = "unittestuser"
write:
SELECT * FROM c WHERE c["context-info"]["created-by"] = "unittestuser"
You can also access properties using the quoted property operator []. For example, SELECT c.grade and SELECT c["grade"] are equivalent. This syntax is useful when you need to escape a property that contains spaces, special characters, or happens to share the same name as a SQL keyword or reserved word.
- is one of those special characters, so to access a property which contains -, you need to use the quoted property operator. It is documented :)
Of course, the idiomatic way would be to use camel casing instead of hyphens, but if you don't want to change your structures, you'll need to use the quoted properties.
For example, using your test data, this query works:
SELECT c["country-code"] FROM root.DocumentDbTest_Countries c
EDIT:
The syntax of the query is a bit confusing, which is what led to most of your problems. Contrary to what you might think,
select * from DocumentDbTest_Countries
doesn't in fact mean "get me all the data in DocumentDbTest_Countries". Instead, it seems to mean "get me all the data in the current collection, and alias it as DocumentDbTest_Countries". This is obvious when you look at the data returned - you'd expect it to return only the fields inside of DocumentDbTest_Countries, but it actually returns all of the values, including the id (which is not a part of DocumentDbTest_Countries - should have been obvious earlier :D).
I don't understand why it's designed as this (even using DocumentDbTest_Countries c to explicitly specify an alias doesn't select DocumentDbTest_Countries), but the fix is to actually start the identifier with the collection name. root is just a way to refer to "this collection", so
select * from root.DocumentDbTest_Countries
returns what you'd expect from the original query. Unless you figure out why the original query behaves the way it does, I'd stick with explicitly using root (or a collection name) as the root every time. It seems to me that using from whatever will always return the current collection, unless you have a collection named whatever - a weird design decision, if you ask me. This means that unless you have a collection named lotsOfFun, the following works the same as using root:
select * from lotsOfFun.DocumentDbTest_Countries
Maybe it's because the top-level object is not named, so they decided that whatever name will work just as well, but that's just an idea.
Well the trick was to use CollectionName.DocumentName instead of just the DocumentName, like this (thanks to #Laan for pointing me in that direction) :):
SELECT * FROM TestProject.DocumentDbTest_Countries c where c["#country"] = "C26092630539"
But then I still miss the Document.Id and Document.SelfLink data in the return Document data.
What I'm trying to do is use an UriBuilder and HttpUtility.ParseQueryString to get the last page the user visited and then parse the URL to get just the productID. (The product ID is different on each page if that matters)
example URL: website.com/stuff/?referrerPage=1&productID=1234567&tab=Tile
and what I want is just the 1234567
Page_Load is where I parse the URL:
protected void Page_Load(object sender, EventArgs e)
{
NameValueCollection query = HttpUtility.ParseQueryString(uriBuilder.Query);
//I want to take the parse string and get productID here, how?
}
grabURL is where I get the last URL visited:
public grabUrl(string Uri)
{
UriBuilder uriBuilder = new UriBuilder(Request.UrlReferrer);
return uriBuilder.Uri;
}
Am I on the right track with my code? How do I put the productID number in something so I can work with it? I'm very new to c# and this type of coding in general... when I say new I mean I've been doing it for about a week. So any detailed explanations or examples will be very much appriciated. Thanks everyone for being so helpful, I'm learning a lot from this site to get me on the right track.
From a NameValueCollection you can then do:
var id = query["ProductID"];
You can use int.TryParse to turn it into an integer proper.
int id = 0;
if (!string.IsNullOrEmpty(query["ProductID"]) &&
int.TryParse(query["ProductID"], out id)) {
// use id here..
}
Or you could just request the querystring value from your URL using Request.QueryString()
protected void Page_Load()
{
//save yourself the conversion to int and just save it as Int if you know for sure
// it will always be int
int _prodID= Request.Querystring["productID"];
//validate _prodID
if (!string.IsNullOrEmpty(_prodID.toString())) {//do something }
}
Could you use a regex to parse it instead?
string uri = "website.com/stuff/?referrerPage=1&productID=1234567&tab=Tile";
var rgx = new Regex("productID=(?<pid>[0-9]+)", RegexOptions.IgnoreCase);
string pid = rgx.Match(uri).Groups[1].Value;
Edit: Providing context as it has been suggested I should do:
The reason for mentioning this option is that while it doesn't use HttpUtility.ParseQueryString, your original question was very specific:
get just the productID
from
the last page the user visited
(which I understand to not be the uri of the current request). Additionally you provided the Uri was provided in a string format.
The approach in your question does this:
Takes Uri (a string variable)
Passes it to UriBuilder; UriBuilder in its constructor initialises a Uri, which itself does a ton of work to validate the uri, populate properties etc. by creating more strings, among other things
Then, from the Uri object generated, you now work with the Query property - this is one of the new strings that Uri has allocated
Passes that to HttpUtility.ParseQueryString. This creates a HttpValueCollection, which itself iterates character-by-character over the string you pass in to parse out the key-value pairs in the query string, and sticks them into a NameValueCollection, which ultimately stores your values in an ArrayList - one of the least efficient collections in .NET (see various references including this one - What's Wrong with an ArrayList?) as it stores everything in an object array, requiring casting every time you get things back out.
finally you then go and search that collection by a key to get your product id back out.
That's a whole lot of string and character allocations, casting to and from objects, putting things into indexed arrays which you then scan, etc. just to get:
a string which is identifiable by a pattern from another string.
While I admit that memory is cheap, it seems that there might be an equally good, or better, alternative. This is what regex was made for - find a pattern in a string, and allow you to get parts of that pattern back out.
So, your options:
If you just want to get productID out of a uri in an exact form, and the uri is originally in a string, then I maintain that a regex is a very good, efficient choice. This will also work if you want to extract other patterns from your uri.
If you want to know all the keys in your query string as well as values for specific keys, then use your HttpUtility.ParseQueryString approach; NameValueCollection allows you to get access to the keys through the AllKeys property.
If you want to get the value of a query string parameter for the uri of your current request then Marcianin's answer is the simplest choice, and you can forget the first 2 options.
In all cases, once you have the string you can parse it using the parse methods on int, but if you use 2. or 3. your extracted id may not be a number (in the case of a malicious request) so make sure you use int.TryParse not int.Parse to convert from a string, and be careful to catch exceptions. You should always take care when taking input from query strings so as not to fall foul of malicious data in the query string (which will hit your website frequently once it is online).
The choice is personal preference.
The actual code you write is about the same - each approach takes up very few lines.
Code should never prematurely optimise, but it should also not be deliberately wasteful if it can help it. You could performance test one against the other, but that would be a waste of your time at this stage.
However, code should also convey intent. The Regex approach, even I would argue, doesn't convey your intent as well as the ParseQueryString approach.
Footnote: I would change the regex slightly to "[?&]productID=(?<pid>[0-9]+)" to ensure you pick up only "productID" and not "fooProductID"
Most importantly, you asked
Am I on the right track with my code?
I would say you are. Always weigh up different options as you proceed. Don't be afraid to try different things out. You say you are new to C#. The one thing you may have missed, in this case, is writing a test to help you on your way before you wrote your code: if the test passes, the code is correct, and the approach you chose is secondary to that. Visual Studio makes testing easy for you if you are using the latest version. If you get into good habits early on, it will pay dividends later on in your C# career.
How do I put the productID number in something so I can work with it?
Grant Thomas answered this perfectly - int.TryParse turns the string into a number.
I am trying to read value from DB using c#.
The query string contains multiple single quotes - such as: Esca'pes' (the query strings are being read from a text file)
So, I wanted to replace all the single quotes with two single quotes before forming the SQL query. My code is as below:
if (name.Contains('\''))
{
name = name.Replace('\'','\''');
}
How to fix this?
Use strings, not char literals.
name = name.Replace("'", "''");
However it sounds like you're concatenating SQL strings together. This is a huge "DO NOT" rule in modern application design because of the risk of SQL injection. Please use SQL parameters instead. Every modern DBMS platform supports them, including ADO.NET with SQL Server and MySQL, even Access supports them.
name = name.Replace("'","''");
On an unrelated note, you're concatenating strings for use in SQL? Try parameters instead, that's what they're meant for. You're probably making it harder than it needs to be.
Since you want to replace a single character with two characters, you need to use the String overload of Replace
if (name.Contains('\''))
{
name = name.Replace("'","''");
}
(Note: single quotes don't require escaping in Strings like they do in character notation.)
Disclaimer: I KNOW that in 99% of cases you shouldn't "serialize" data in a concatenated string.
What char you guys use in well-known situation:
string str = userId +"-"+ userName;
In majority of cases I have fallen back to | (pipe) but, in some cases users type even that. What about "non-typable" characters like ☼ (ALT+9999)?
That depends on too many factors to give a concrete answer.
Firstly, why are you doing this? If you feel the need to store the userId and userName by combining them in this fashion, consider alternative approaches, e.g. CSV-style quoting or similar.
Secondly, under normal circumstances only delimiters that aren't part of the strings should be used. If userId is just a number then "-" is fine... but what if the number could be negative?
Third, it depends on what you plan to do with the string. If it is simply for logging or debugger or some other form of human consumption then you can relax a bit about it, and just choose a delimiter that looks appropriate. If you plan to store data like this, use a delimiter than ensures you can extract the data properly later on, regardless of the values of userId or userName. If you can get away with it, use \0 for example. If either value comes from an untrusted source (i.e. the Internet), then make sure the delimiter can't be used as a character in either string. Generally you would limit the characters that each contains - say, digits for userId and letters, digits and SOME punctuation characters for userName.
If it's for data storage and retrieval, there is no way to guarantee that a user won't find a way to inject your delimiter into the string. The safe thing to do is pre-process the input somehow:
Let - be the special character
If a - is encountered in the input, replace it with something like -0.
Use -- as your delimiter
So userid = "alpha-dog" and userName = "papa--0bear" will be translated to
alpha-0dog--papa-0-00bear
The important thing is that your scheme needs to be perfectly undoable, and that the user shouldn't be able to break it, no matter what they enter.
Essentially this is a very primitive version of sanitization.