Here is the thing:
I need to display japanese character in listview in a SQL operated database manager I am currently building for a friendly company. Tried to google, but all answers led me to nothing really. Instead of displaying characters it just does "????". Have a look:
but I am loading a properly displayed .csv file from a machine that has a japanese installed on it. Also its been saved as utf8:
Font I am using is Meiryo UI. Tried Tahoma and the same thing is happening. Loading is being done including encoding:
3
And finally here's the code responsible for stuffing the data into a listview:
4
I would really appreciate if someone could help me. Thanks!
You are using a streamreader to open the file, but you are not using that same streamreader to read the data. Instead you are instructing SQL server to open it using the BULK INSERT command. Prior to Sql 2012 SP2, there was no support for UTF-8 in BULK INSERT.
If you are using Sql 2012 SP2 or above, you might consider Tom-K answer here:
How to write UTF-8 characters using bulk insert in SQL Server?
Failing that, you must either convert the file to UTF-16 before doing the bulk insert, or use another method.
I managed to solve this thing. While using SQL Server 2014 I simply forgot to change the collation encoding in database settings. It was set on Latin instead of Japanese-Unicode BIN. Thanks to Ben for pointing me right direction.
Fixed
Related
UTF-8 formatted txt file bulk insert into Sql Server Management Studio 2008 R2, nvarchar(50) coloumns, doesn't work properly.
This is the summary of the problem, step by step.
I can't implement Turkish characters although txt file is UTF-8, and column is nvarchar(50)
Please use Unicode native format for export and bulk import as it described in http://technet.microsoft.com/en-us/library/ms189941%28v=sql.100%29.aspx
SQL Server does not support code page 65001 (UTF-8 encoding). Try to change your code page.
Additionally, you may have trouble with your sql server collation configuration.
"Data Types for Bulk Exporting or Importing SQLXML Documents" section in the link stated below most probably solve your case.
http://technet.microsoft.com/en-us/library/ms188365.aspx
You can try DATAFILETYPE options as described in above section.
Hope it helps
I have 2 .txt files with data that i need to import to a sql server database in order to continue my project in Visual Studio C#. I was told to use the Stream writer/reader. Can someone explain to me how to use it, and show me all the aspects of how to do it? I am very new.
If you just need to insert the information by hand, you can follow the tips provided here.
If you want to do it through C#, here are some links to get you started:
First you need to parse the text files to retrieve the data. Here's an example on how to do that.
Next you'll need to insert the information. Here's a Beginners guide to doing that.
Good luck!
I have an example here.
http://granadacoder.wordpress.com/2009/01/27/bulk-insert-example-using-an-idatareader-to-strong-dataset-to-sql-server-xml/
It was written for VS2003, but updating it to VS2010 or (or 2008 or other) would be trivial.
It does not use a stream reader. It uses OleDb.
The above is a good solution if you need to do any validations on the data before inserting it.
Here is another idea.
http://www.codeproject.com/Articles/27802/Using-OleDb-to-Import-Text-Files-tab-CSV-custom
I am getting a very big file from a linux box which I import with TOAD Wizard to SQL Server Express for testing
The file is supposed to be correctly using special characters like ÄäÖö... which the admin of the box confirms.
I am seeing only misinterpreted characters (like Ä) via Putty&less, textviewer in windows, toads import wizard, inside the db and when returning the values in .net
The only idea I have is to replace the characters in C# but for that I would need a complete list of replacements to do.
Does anyone have such a list, a finished class or any other idea?
I solved the problem by converting the file on the unix side:
iconv unicode unknown input format
use iconv to upconvert UTF-8 to UTF-16, which SQLServer can import correctly
I'm working with a CSV that contains characters like:
” and •
I am reading the CSV via OleDb and the provider is Microsoft.Jet.OLEDB.4.0. when the data is loaded into the OleDbCommand, the characters are converted to the following respectively:
“ and •
I suspected there might be a collation setting in the connection string but I was unable to find anything about this.
I can confirm the following:
I can see the original character in the CSV when I open it.
If I run a select on the file through OleDb WHERE [field] LIKE '%•%' I get 0 rows but if SELECT WHERE [field] LIKE '%“%' I get rows returned.
Any thoughts?
Finally! Thanks to #HABJAN I was able to get to the resolution which is as simple as setting the CharaterSet in the Extended Properties of the connection string. For my situation it was UTF-8...commonly used by default in PHPMyAdmin which is where my data was retrieved from.
Resulting working connection string:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"{0}\";Extended Properties=\"text;HDR=Yes;FMT=Delimited;CharacterSet=65001;\""
Key is CharacterSet=65001 (Code Page Identifier for UTF-8) which might have been obvious to some collation savvy individuals but I've somehow managed to avoid these issues over the years and never come across it in this respect.
I was also able to get HABJAN's solution to work when also following the documentation found # http://msdn.microsoft.com/en-us/library/ms709353%28v=vs.85%29.aspx and setting the CharacterSet to the same as above.
For my situation, this is the better method as it is a simpler/more maintainable solution, but +1 to HABJAN for helping me get there!
Thanks
You can create schema.ini file and play with format and CharacterSet properties.
Take a look at this sample: How to read data from Unicode formatted text file and import to Data Table using .Net
And here is another sample that will show you how to read csv file with schema.ini: Importing CSV file into Database with Schema.ini
Currently we are saving files (PDF, DOC) into the database as BLOB fields. I would like to be able to retrieve the raw text of the file to be able to manipulate it for hit-highlighting and other functions.
Does anyone know of a simple way to either parse out the files and save the raw text on save, either via SQL or .net code. I have found that Adobe has a filtdump utility that will convert the PDF to text. Filtdump seems to be a command line tool, and i don't see a way to use a file stream. And what would the extractor be for Office documents and other file types?
-or-
Is there a way to pull out the raw text from the SQL Full text index, without using 3rd party filters?
Note i am trying to build a .net & MSSql solution without having to use a third party tool such as Lucene
If it isn't absolutely necessary to stream directly from SQL Server into your app, the hard part is parsing the PDF or DOC file formats.
The iTextSharp library will give you access to the innards of a PDF file:
http://itextsharp.sourceforge.net/
Here's a commercial product that claims to parse Word docs:
Aspose.Words
Edited to add:
I think you're also asking if there are ways to make SQL Server Full-text Indexing do the work for you by adding IFilters. This sounds like a good idea. I haven't done this myself, but MS has apparently supported a Word filter for a long time, and now Adobe has released a (free) PDF filter. There's a lot of information here:
Filter Central
10 Ways to Optimize SQL Server Full-text Indexing
SQL Server Full Text Search: Language Features - a little out of date but easy to understand.
SQL Server Full-Text Search feature uses IFilters for extracting plain text from PDF or Office file formats. You can install IFilters on your server or if your code is running on the same machine as SQL Server you're already have it.
Here is an article which shows how to use IFilters from .NET: http://www.codeproject.com/KB/cs/IFilter.aspx
You could from your C# application open the .doc file and save it as text and put both the text and .doc document into the database.
If you are using SQL 2008, then you could consider using the new FILESTREAM feature.
Your data is stored in a varbinary(max) column, but you can also access the raw data via a regular Win32 handle.
Here's some sample code showing how to get the handle.
I had this same issue... I solved it by adding the following to my application:
EPocalipse.IFilter.dll (for everything -but- Office 2007
documents, due to 64x Windows issues)
OpenXML SDK 2.0 (for Office 2007 Documents)
I use these to grab the plain text and then store it in the database alongside the binary data. Keep in mind that I am certainly not an expert, so there may be a better way to do this, but this works for everything but "Quick Save" pre-2007 Word Documents, which apparently aren't read by iFilters. I just have my users resave the document if that error occurs, and everything works fine.
Let me know if you'd like some sample code... I would post it right now, but it's a bit long.