FileInfo.GetFiles() and special characters (accents) - c#

When I insert in DB a string that contains special character as a "à" or a "é" from a FileInfo.GetFiles() item, I get issues and SQL save splitted special char. Non-special chars are OK.
For instance, "à" becomes "a`", and "é" becomes "e´". Did anyone get this kind of trouble?
Here is the code
DirectoryInfo di = new DirectoryInfo(path);
foreach (FileInfo fi in di.GetFiles())
{
Logger.LogInfo("Info: " + fi.Name);
}
Basically, if string is "sàrl", log saved "Info: sa`rl"
When I breakpoint trough VS, I see the string with "à" but when I log it, char are splitted.
My SQL collation is Latin CI AS (SQL_Latin1_General_CP1_CI_AS) and DB already host string with special char without problem.
Thanks folks
EDIT
I have trouble when I insert the fi.Name into the final table too:
public bool InsertFile(string fileName, Societe company, string remark, PersonnelAM creator)
{
string commandText = (#"INSERT INTO [dbo].[TB_DOCSOCIETE_COM] " +
"([IdtSOC] " +
",[NomDOC] " +
",[RemDOC] " +
",[DateDOC] " +
",[IdtPER]) " +
"VALUES " +
"(#company" +
",#fileName" +
",#remark" +
",#date" +
",#creator) SELECT ##IDENTITY");
var identity = CreateCommand(commandText,
new SqlParameter("#fileName", DAOHelper.HandleNullValueAndMinDateTime<string>(fileName)),
new SqlParameter("#company", DAOHelper.HandleNullValueAndMinDateTime<int>(company.Id)),
new SqlParameter("#remark", DAOHelper.HandleNullValueAndMinDateTime<string>(remark)),
new SqlParameter("#date", DAOHelper.HandleNullValueAndMinDateTime<DateTime>(DateTime.Now)),
new SqlParameter("#creator", DAOHelper.HandleNullValueAndMinDateTime<int>(creator.id))
).ExecuteScalar();
return int.Parse(identity.ToString()) > 0;
}
I'm using NLog so data is varchar(8000) for message column and code that logs message is
public static bool LogInfo(Exception ex, string message = "")
{
try
{
GetLogger().Log(LogLevel.Info, ex, message);
}
#pragma warning disable 0168
catch (Exception exception)
#pragma warning restore 0168
{
return false;
}
return true;
}
EDIT 2 :
To be clear about DB, those 3 lines:
Logger.LogInfo("BL1 " + "sàrl is right saved");
Logger.LogInfo("BL2 " + fi.Name + " is not right saved");
Logger.LogInfo("BL3 " + "sàrl" + " - " + fi.Name + " is not right too!");
Gave me that result in DB:
BL1 sàrl is right saved
BL2 ENTERPRISE Sa`rl - file.pdf is not right saved
BL3 sàrl - ENTERPRISE Sa`rl - file.pdf is not right too!
So it doesn't come from DB, it is an issue about the string (encoding?)

varchar(8000)
Make the column NVARCHAR. This is not a collation issue. Collations determine the sort order and comparison rules, not the storage. Is true that for non-unicode columns (varchar) the collation is used as hint to determine the code page of the result. But code page will only get you so far, as obviously a 1 byte encoding code page cannot match the entire space of the file system naming, which is 2 bytes encoding Unicode based.
Use an Unicode column: NVARCHAR.
If you want to understand what are you experiencing, just run this:
declare #a nvarchar(4000) = NCHAR(0x00E0) + N'a' + NCHAR(0x0300)
select #a, cast(#a as varchar);
Unicode is full of wonderful surprises, like Combining characters. You can't distinguish them visually, but they sure show up when you look at the actual encoded bytes.

Related

cant store an HTML page when ncarchar(max) is capped at 4000 characters

To put it simple, how to increase the cap of nvarchar(MAX) to actually hold 280MB of text and not just 8000MB (correct me if I'm wrong)?
So, for my finals project I'm making a web-crawler for a client that wants its own customized search engine for their library website, but my problem arises when i try to store the infomation that the crawlers retrieve.
Specifically the problem I have is that even tho I set the column "HTML" to nvarchar(MAX), which should be able to hold 2GB of data, it wont save any infomation to it, in this case 280MB, cause it's too long.
I did try shortening the length of the text to be saved and when I made it sufficiently short enough it finally agreed to save the data, so from what I can understand it's capped.
EDIT: Code examples as requested
page container class:
public class Page
{
public int ID = -1;
public String URL;
public String HeadLine;
public List<String> Tags;
public String Description;
public String HTML;
public DateTime lastUpdate;
}
Code snippet when crawler saves the page that it has retrieved:
//Save Page content to Database
Page page = new Page();
page.URL = url;
page.HeadLine = headline;
page.Tags = tags.Split(',').Where(s => !string.IsNullOrWhiteSpace(s)).ToList();
page.Description = description;
page.HTML = HTML;
page.lastUpdate = DateTime.Today;
new DBpage(Settings.instance.DBaddress,
Settings.instance.DBname).SavePage(page);
Method used for storing the data:
public void SavePage(Page page) {
String SqlString = "";
//Check is a page by the given URL already exists in the database and assign the SQL string acordingly
Page foundPage = GetPage(page.URL);
if(foundPage == null) {
SqlString = "INSERT INTO WebContent " +
"VALUES (#URL, #HeadLine, #Tags, #Description, #HTML, #LastUpdate)";
}
else {
SqlString = "UPDATE WebContent " +
"SET URL = #URL, HeadLine = #HeadLine, Tags = #Tags, Description = #Description, HTML = #HTML, LastUpdate = #LastUpdate " +
//"SET URL = '" + page.URL + "', HeadLine = '" + page.HeadLine + "', Tags = '" + String.Join(",", page.Tags) + "', Description = '" + page.Description + "', HTML = '" + page.HTML.Replace("'", "''") + "', LastUpdate = " + page.lastUpdate + " " +
"WHERE ID = " + foundPage.ID;
}
//Assign all variables and execute the SQL
try {
using(DBaccess db = new DBaccess(dblocation, dbname)) {
String html = page.HTML.Replace("'", "''"); //Replace all single quotes with double "single quotes" to escape the first single quote.
SqlCommand sqlCmd = db.GetSqlCommand(SqlString);
sqlCmd.Parameters.AddWithValue("#URL", page.URL);
sqlCmd.Parameters.AddWithValue("#HeadLine", page.HeadLine);
sqlCmd.Parameters.AddWithValue("#Tags", String.Join(",", page.Tags));
sqlCmd.Parameters.AddWithValue("#Description", page.Description);
sqlCmd.Parameters.AddWithValue("#HTML", html);
sqlCmd.Parameters.AddWithValue("#LastUpdate", page.lastUpdate);
sqlCmd.ExecuteNonQuery();
}
}
catch(SqlException e) {
Console.WriteLine(e.Message);
}
}
The unfortunate result that puzzles me:
nvarchar(max) type does allow to store up to 2GB of data. For nvarchar it means about 1 billion characters, because N types store text in 2-bytes per character unicode.
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length
and can be a value from 1 through 4,000. max indicates that the
maximum storage size is 2^30-1 characters. The maximum storage size in
bytes is 2 GB. The actual storage size, in bytes, is two times the
number of characters entered + 2 bytes.
Most likely your problem is somewhere in the procedure that tries to INSERT such large text. The first thing that comes to mind is some timeout. It will take a while to upload 280MB of data to the server, so examine the details of failure (look through the error messages and exceptions) to gather clues of what is going wrong.
Few things to check:
Double check the type of the HTML column in the database.
Maybe SSMS doesn't display the long value correctly. Try to run
SELECT LEN(HTML) FROM YourTable
to verify the length of the stored string.
Overall, just step through the code in the debugger and verify that all variables have expected values.

How to display query in output

So I have a query and am trying to display it in the Debug Output, when I run the file it gives me a list of output starting with iisexpress.exe : https://gyazo.com/fd9eb832dfcc08571b31490103b85b49
but no actual result? I am trying to run a query on Visual Studios2015 for the first time using the dotnetRDF. My code is below:
public static void Main(String[] args)
{
Debug.WriteLine("SQLAQL query example");
//Define a remote endpoint
//Use the DBPedia SPARQL endpoint with the default Graph set to DBPedia
SparqlRemoteEndpoint endpoint = new SparqlRemoteEndpoint(new Uri("http://dbpedia.org/sparql"), "http://dbpedia.org");
//SPARQL query to show countries, population, capital for countries where population is more than 100000 and limit results to 50
String queryString = "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> " +
"PREFIX type: <http://dbpedia.org/class/yago/> " +
"PREFIX prop: <http://dbpedia.org/property/> " +
"SELECT ?country_name ?population ?cptl " +
"WHERE { " +
"?country rdf:type type:Country108544813. " +
"?country rdfs:label ?country_name. " +
"?country prop:populationEstimate ?population. " +
"?country dbo:capital ?cptl " +
"FILTER (?population > 1000000000) . " +
"}" +
"LIMIT 50 ";
Debug.WriteLine("queryString: [" + queryString + "]");
//Make a SELECT query against the Endpoint
SparqlResultSet results = endpoint.QueryWithResultSet(queryString);
foreach (SparqlResult result in results)
{
Debug.WriteLine(result.ToString());
}
}
Just learning SPARQL so this maybe a very basic question.
Many Thanks:)
You need to make sure that your code is compiled and run in Debug mode. If is not then Debug.WriteLine() will have no effect. The output you have provided is incomplete, future reference it is better to copy and paste into your question rather than posting a screenshot.
Since this appears to be a console application why not just use Console.WriteLine() instead?

C#: Double Carriage Return Outputting when String only has one

I have an odd issue that one would think is easy to solve. Basically I am creating a string like this:
string temp = "SET pagesize 50000;" + Environment.NewLine + "SET linesize 120;" + Environment.NewLine + sQuery.Text + Environment.NewLine + resultsQuery;
sQuery is an update statement and reslults query is showing a breakdown of the the results. Here is the results query. The update query is similarly formatted.
resultsQuery = "SELECT project_no, contact_date, SUM(CASE WHEN control_group = 'N' THEN 1 ELSE 0 END) \"CG = N\", SUM(CASE WHEN control_group = 'Y' THEN 1 ELSE 0 END) \"CG = Y\"" + "\r\n" +
"FROM OIC_TRACK_TEMP" + "\r\n" +
"WHERE job_no = " + tbJobNo.Text.Trim() + "\r\n" +
"AND cpm_customer_code = '" + lbClientID.Text.ToUpper() + "'" + "\r\n" +
"GROUP BY project_no, contact_date" + "\r\n" +
"ORDER BY contact_date, project_no;";
I then write the query to a command line:
updatessh.Write(temp);
The output in the command line looks like this:
SET pagesize 50000;
SET linesize 120;
UPDATE TABLE_NAME
SET PROJECT_NO = 'test'
.
.
How can I get rid of the double carriage returns in there? Oracle pukes when it sees them.
Thanks!
In my experience, SQL engines look at both \r and \n as newlines, so when including them with Environment.Newline, or in Windows, \r\n, you are getting both of those.
Try using \n instead of \r\n in your sql query.
You can replace doubled carriage returns with RegEx in final string, or just remove it before queries added. But if you always has them - just remove (dont add) + Environment
NewLine

MySQL leading whitespace with C#

When I update a field in my MySQL database, it always adds a whitespace to the value.
I tried to remove the whitespace with the trim-command and the replace-command. Neither of them worked. So I expect that it isn't a whitespace but some vague ASCII character. These are the commands I used:
this.foo = result.GetValue(0).ToString().Trim();
this.bar = result.GetValue(0).ToString().Replace(" ","");
The field it updates is a VARCHAR(xx). This is my MySQL update command:
MySqlCommand cmd = new MySqlCommand("UPDATE " + table + " SET " + new_field + " =' " + new_value+ "' WHERE " + field+ "= " + value + "",this.con);
this.con is my connection to the MySQL database.
FYI: I use .NET 3.5CF with a mysql.data.cf DLL in Visual Studio 2008.
Could someone help me out with this problem? It's driving me nuts.
Well yes, you've got a leading space in the SQL:
"UPDATE " + table + " SET " + new_field + " =' " + new_value+ "'
Note the bit straight after "=" - you've got a quote, then a space, then new_value.
However, you shouldn't be putting the values in the SQL directly in the first place - you should be using parameterized SQL statements... currently you've got a SQL injection attack waiting to happen, as well as potential problems for honest values with quotes in.
You should use parameterized SQL for both new_value and value here... I'm assuming that field and table come from more "trusted" sources?
This appears to have a space where the * is
" ='*" + new_value

String concatenation doesn't seem to work in C#

I don't know what is wrong with the following string:
"Report(" + System.DateTime.Now.ToString("dd-MMM-yyyy") + " to " + System.DateTime.Now.AddMonths(-1).ToString("dd-MMM-yyyy") + ")"
I can't get the concatenated string. I am getting Report(29-Dec-2009. That's all and
the rest gets left out from the string.
What is the reason?
Try this:
string filename =
String.Format(
"Report({0:dd-MMM-yyyy} to {1:dd-MMM-yyyy})",
System.DateTime.Now, System.DateTime.Now.AddMonths(-1));
EDIT: Since in your download box you got your filename broken in first whitespace, you could to try ONE of these:
filename = HttpUtility.UrlEncode(filename); // OR
filename = """" + filename + """";
Seems some browsers doesn't handles whitespaces very nicely: Filenames with spaces are truncated upon download. Please check it you can to download other filenames with whitespace in other sites.
You need to assign it to something:
string s = "Report(" + System.DateTime.Now.ToString("dd-MMM-yyyy") + " to " + System.DateTime.Now.AddMonths(-1).ToString("dd-MMM-yyyy") + ")"
Update: I just saw your update to the question. How are you displaying the string? I'm guessing that you are displaying it in a GUI and the label is too short to display the complete text.
Try this:
string newstring =
string.Format(
"Report ({0} to {1})",
System.DateTime.Now.ToString("dd-MMM-yyyy"),
System.DateTime.Now.AddMonths(-1).ToString("dd-MMM-yyyy")
);
What are you assigning the result to? It would be easier to read the code if you used string.Format
You are not assigning the concatenated result to anything, so can't use it:
string myConcatenated = "Report(" + System.DateTime.Now.ToString("dd-MMM-yyyy") + ")";
Using this code...
string test = "Report(" + System.DateTime.Now.ToString("dd-MMM-yyyy") + " to " +
System.DateTime.Now.AddMonths(-1).ToString("dd-MMM-yyyy") + ")";
I saw the following result.
Report(29-Dec-2009 to 29-Nov-2009)
It could be that the string is being truncated later on. Make sure that you set a breakpoint right after this code is run and check the value of the variable to which it is assigned (test in my case).
If, as in your previous question, you are using this value to create a file, it may be that it's the space before "to" that is causing the problem. Try to use:
"Report("
+ System.DateTime.Now.ToString("dd-MMM-yyyy")
+ "To"
+ System.DateTime.Now.AddMonths(-1).ToString("dd-MMM-yyyy")
+ ")"
instead and see if that fixes it.
If that does fix it, you'll probably need to either figure out how to quote the entire file name so it's not treated as the three separate arguments, "Report(29-Dec-2009", "to" and "29-Nov-2009)". Or simply leave your reports names without spaces.
I'd choose the latter but then I'm fundamentally opposed to spaces in filenames - they make simple scripts so much harder to write :-)

Categories