Strange symbols in SQLite text field - c#

I have a simple SQLite database where data (names) is added with a C# application. The names usually get copied and pasted from .pdf files. I found out that sometimes copying a name from .pdf generates some weird symbols. During browsing data with SQLite DB Browser I saw that some records in my database have things mingled in between like 'DC3', 'FS', 'US' and so on:
This messes with 'WHERE' clause in my queries, for example the following query would yield 0 results:
SELECT Id FROM tblPerson WHERE Name = 'Alex Denelgo';
Can someone explain what these symbols are and how can I write query to find all the "corrupted" name records? I can't go one by one manually with browser since the data already contains thousands of different names.

It seems these symbols are Non printable ASCII control characters.
The way I found the "corrupted" records is using regex. If you have the same problem as me you can use the following query to find these kinds of records. I am selecting all records minus records that only contain letters from a-z, space and dot you can modify the regex for your case of course:
SELECT Name FROM tblPerson
EXCEPT
SELECT Name FROM tblPerson WHERE Name REGEXP "^[A-Za-z .]+$";

Related

Windows indexed files OLE DB search with like instead of contains

So I have the problem that I need to exchange this contains, with a like in my query. Because when you search with a dash in a word so "12-0430-1" then the contains produces results that also contain only 12 or 0430 or 1. This is intended and also spoken about by microsoft here. Also the solution is in this article, but sadly only in a way that does not help. The solution is to exchange contains with like, but I always get an error with this edited query:
SELECT TOP 2000 System.FileName
FROM systemindex
WHERE DIRECTORY = 'C:\Path...'
AND (System.FullText Like ('%12-0430-1%'))
ORDER BY System.DateCreated DESC
The error says because the column does not exist, but for OLE DB search I only found this older site which specifies which columns can be searched.
Before I did not need it because I used contains:
Ole-DB query: for 22-0130-1 SELECT TOP 2000 System.FileName FROM
systemindex WHERE DIRECTORY = 'C:\Path' AND (CONTAINS
('"12-0430-1*"')) ORDER BY System.DateCreated DESC
So could somebody please link me a page with the current columns that can be searched over windows indexed files or say which column exists and would be working, I want to search through the indexed contents of the file.
Edit these are my settings with which I create the OleDBConnection:
OleDbConnection connection =
new OleDbConnection("Provider=Search.CollatorDSO;Extended Properties='Application=Windows';");

SQL Server database truncated a big base64 string [duplicate]

How do you view ALL text from an NTEXT or NVARCHAR(max) in SQL Server Management Studio? By default, it only seems to return the first few hundred characters (255?) but sometimes I just want a quick way of viewing the whole field, without having to write a program to do it. Even SSMS 2012 still has this problem :(
I was able to get the full text (99,208 chars) out of a NVARCHAR(MAX) column by selecting (Results To Grid) just that column and then right-clicking on it and then saving the result as a CSV file. To view the result open the CSV file with a text editor (NOT Excel). Funny enough, when I tried to run the same query, but having Results to File enabled, the output was truncated using the Results to Text limit.
The work-around that #MartinSmith described as a comment to the (currently) accepted answer didn't work for me (got an error when trying to view the full XML result complaining about "The '[' character, hexadecimal value 0x5B, cannot be included in a name").
Quick trick-
SELECT CAST('<A><![CDATA[' + CAST(LogInfo as nvarchar(max)) + ']]></A>' AS xml)
FROM Logs
WHERE IDLog = 904862629
In newer versions of SSMS it can be configured in the (Query/Query Options/Results/Grid/Maximum Characters Retrieved) menu:
Old versions of SSMS
Options (Query Results/SQL Server/Results to Grid Page)
To change the options for the current queries, click Query Options on the Query menu, or right-click in the SQL Server Query window and select Query Options.
...
Maximum Characters Retrieved
Enter a number from 1 through 65535 to specify the maximum number of characters that will be displayed in each cell.
Maximum is, as you see, 64k. The default is much smaller.
BTW Results to Text has even more drastic limitation:
Maximum number of characters displayed in each column
This value defaults to 256. Increase this value to display larger result sets without truncation. The maximum value is 8,192.
I have written an add-in for SSMS and this problem is fixed there. You can use one of 2 ways:
you can use "Copy current cell 1:1" to copy original cell data to clipboard:
http://www.ssmsboost.com/Features/ssms-add-in-copy-results-grid-cell-contents-line-with-breaks
Or, alternatively, you can open cell contents in external text editor (notepad++ or notepad) using "Cell visualizers" feature: http://www.ssmsboost.com/Features/ssms-add-in-results-grid-visualizers
(feature allows to open contents of field in any external application, so if you know that it is text - you use text editor to open it. If contents is binary data with picture - you select view as picture. Sample below shows opening a picture):
Return data as XML
SELECT CONVERT(XML, [Data]) AS [Value]
FROM [dbo].[FormData]
WHERE [UID] LIKE '{my-uid}'
Make sure you set a reasonable limit in the SSMS options window, depending on the result you're expecting.
This will work if the text you're returning doesn't contain unencoded characters like & instead of & that will cause the XML conversion to fail.
Returning data using PowerShell
For this you will need the PowerShell SQL Server module installed on the machine on which you'll be running the command.
If you're all set up, configure and run the following script:
Invoke-Sqlcmd -Query "SELECT [Data] FROM [dbo].[FormData] WHERE [UID] LIKE '{my-uid}'" -ServerInstance "database-server-name" -Database "database-name" -Username "user" -Password "password" -MaxCharLength 10000000 | Out-File -filePath "C:\db_data.txt"
Make sure you set the -MaxCharLength parameter to a value that suits your needs.
I was successful with this method today. It's similar to the other answers in that it also converts the contents to XML, just using a different method. As I didn't see FOR XML PATH mentioned amongst the answers, I thought I'd add it for completeness:
SELECT [COL_NVARCHAR_MAX]
FROM [SOME_TABLE]
FOR XML PATH(''), ROOT('ROOT')
This will deliver a valid XML containing the contents of all rows, nested in an outer <ROOT></ROOT> element. The contents of the individual rows will each be contained within an element that, for this example, is called <COL_NVARCHAR_MAX>. The name of that can be changed using an alias via AS.
Special characters like &, < or > or similar will be converted to their respective entities. So you may have to convert <, > and & back to their original character, depending on what you need to do with the result.
EDIT
I just realized that CDATA can be specified using FOR XML too. I find it a bit cumbersome though. This would do it:
SELECT 1 as tag, 0 as parent, [COL_NVARCHAR_MAX] as [COL_NVARCHAR_MAX!1!!CDATA]
FROM [SOME_TABLE]
FOR XML EXPLICIT, ROOT('ROOT')
PowerShell Alternative
This is an old post and I read through the answers. Still, I found it a bit too painful to output multi-line large text fields unaltered from SSMS. I ended up writing a small C# program for my needs, but got to thinking it could probably be done using the command line. Turns out, it is fairly easy to do so with PowerShell.
Start by installing the SqlServer module from an administrative PowerShell.
Install-Module -Name SqlServer
Use Invoke-Sqlcmd to run your query:
$Rows = Invoke-Sqlcmd -Query "select BigColumn from SomeTable where Id = 123" `
-MaxCharLength 2147483647 -ConnectionString $ConnectionString
This will return an array of rows that you can output to the console as follows:
$Rows[0].BigColumn
Or output to a file as follows:
$Rows[0].BigColumn | Out-File -FilePath .\output.txt -Encoding UTF8
The result is a beautiful un-truncated text written to a file for viewing/editing. I am sure there is a similar command to save back the text to SQL Server, although that seems like a different question.
EDIT: It turns out that there was an answer by #dvlsc that described this approach as a secondary solution. I think because it was listed as a secondary answer, is the reason I missed it in the first place. I am going to leave my answer which focuses on the PowerShell approach, but wanted to at least give credit where it was due.
If you only have to view it, I've used this:
print cast(dbo.f_functiondeliveringbigformattedtext(seed) as text)
The end result is that I get line feeds and all the content in the messages window of SMSS.
Of course, it only allows for a single cell - if you want to do a single cell from a number of rows, you could do this:
declare #T varchar(max)=''
select #T=#T
+ isnull(dbo.f_functiondeliveringbigformattedtext(x.a),'NOTHINGFOUND!')
+ replicate(char(13),4)
from x -- table containing multiple rows and a value in column a
print #T
I use this to validate JSON strings generated by SQL code. Too hard to read otherwise!
Use visual studio code with sql server plugin. Super usefull for jsons
Alternative 1: Right Click to copy cell and Paste into Text Editor (hopefully with utf-8 support)
Alternative 2: Right click and export to CSV File
Alternative 3: Use SUBSTRING function to visualize parts of the column. Example:
SELECT SUBSTRING(fileXml,2200,200) FROM mytable WHERE id=123456
The easiest way to quickly view large varchar/text column:
declare #t varchar(max)
select #t = long_column from table
print #t

Strange Control Characters in C# DataTable results, not in SSMS or VB.NET results

I have a sql query, select distinct(name) from customers with (nolock) and it returns the text I want in SSMS, ie "Smith, John", etc.
However, when I get the string value from my DataTable in C#, I get back strange Control Characters at the beginning of my string, like \u001f\u001f\u001fSmith, John
Where is this coming from? Is it bad data in my database, or am I missing some steps related to character encoding or collation?
If anything culture or collation-related needs to be done, I'd prefer to do it from either from within the SQL query (without introducting a new SQL function) or from C#, since I can't control what values are placed in the database, I can only read from them.
UPDATE:
I have another VB.NET application which queries these names for a different purpose. This other program does NOT return the printing control characters in the DataTable. This leads me to believe there is something wrong with my SQLAdapter or SQLCommand implementation. Any ideas?
The table collation is SQL_Latin1_General_CP1_CI_AS.

Optimizing SDF filesize

I recently started learning Linq and SQL. As a small project I'm writing a dictionary application for Windows Phone. The project is split into two Applications. One Application (that currently runs on my PC) generates a SDF file on my PC. The second App runs on my Windows Phone and searches the database. However I would like to optimize the data usage. The raw entries of the dictionary are written in a TXT file with a filesize of around 39MB. The file has the following layout
germanWord \tab englishWord \tab group
germanWord \tab englishWord \tab group
The file is parsed into a SDF database with the following tables.
Table Word with columns _version (rowversion), Id (int IDENTITY), Word (nvarchar(250)), Language (int)
This table contains every single word in the file. The language is a flag from my code that I used in case I want to add more languages later. A word-language pair is unique.
Table Group with columns _version (rowversion), GroupId (int IDENTITY), Caption (nvarchar(250))
This table contains the different groups. Every group is present one time.
Table Entry with columns _version (rowversion), EntryId (int IDENTITY), WordOneId (int), WordTwoId(int), GroupId(int)
This table links translations together. WordOneId and WordTwoId are foreign keys to a row in the Word Table, they contain the id of a row. GroupId defines the group the words belong to.
I chose this layout to reduce the data footprint. The raw textfile contains some german (or english) words multiple times. There are around 60 groups that repeat themselfes. Programatically I reduce the wordcount from around 1.800.000 to around 1.100.000. There are around 50 rows in the Group table. Despite the reduced number of words the SDF is around 80MB in filesize. That's more than twice the size of the the raw data. Another thing is that in order to speed up the searching of translation I plan to index the Word column of the Word table. By adding this index the file grows to over 130MB.
How can it be that the SDF with ~60% of the original data is twice as large?
Is there a way to optimize the filesize?
The database file must contain all of the data from your raw file, in addition to row metadata -- it also will contain the strings based on the datatypes specified -- I believe your option here is NVARCHAR which uses two bytes per letter. Combining these considerations, it would not surprise me that a database file is over twice as large as a text file of the same data using the ISO-Latin-1 character set.

Search only numbers with mysql

i have project in ASP.NET MVC 3 and a mysql database that contains a table of string values for phone numbers (this phone numbers can be stored as 123 456-789 or 12345 6789 or 123456789 or any ways the user enter his number) and other table with a keywords data for that users.
The thing is that i have a search that will find in the keywords table (fulltext table) for the users, but i'm writing a method that search in the phone table if the search query matches against certain regular expression.
I have 2 questions:
- How could be that regular expression from the C# code side that tells what method to execute (SearchByKeyword or SearchByNumber)?
- Using the same regular expression i think, i must do the mysql query to search in the phones table... how can i do it?
I hope i have explained well and sorry if my english is a little bit bad.
It's best when you capture the data to standardise the format it is saved in, IE:
111-111-1-111-11
1111-1111-1111
111111111111
1-1-1-1-1-1-1-1-1-1-1-1
All will save as
111111111111
You can then do a simple LIKE query:
SELECT * FROM tblNumbers WHERE number LIKE '%111%'
I would say that its bad idea to feed a phone number from the input -> directly to query variable.
better way would be:
input -> parser, error check procedure -> MySQL query request.
and before adding your phone number to the query - you can remove all extra symbols, like (, ), -, "space", etc.

Categories