I am coming across an issue where using JObject.Parse(); truncates my JSON read from a excel file. It does not work on my machine, but the will work for the exact same data on another person machine using the same method and implementation.
Basically, we are calling a method part of a internal framework that reads an excel doc (data provider) based on the test method that is calling it. Once the row is selected, it then pulls the data stored in the columns cell. The format of this data is a JSON format. I have used 3 different JSON validators to ensure the JSON is valid.
JSON below this has just filler data as i cannot share the actual JSON
{
"columns": [
"column1",
"column2",
"column3",
"column4",
"column5",
"column6",
"column7",
"column8"
],
"data": []
}
when attempting to return the JSON as a JObject the below is done.
var data = JObject.Parse(MyObject.value.AColumn[0]);
Which returns the cell data as a JObject of the data in the cell specified in the above.
When debugging this, I have typed the JSON in the Excel cell and have gotten data to return, but at one point the data starts getting truncated as if there is a specific character limitation. But again, this works perfectly fine on someone else's machine.
I get this error because of the JSON being truncated:
Unterminated string. Expected delimiter: ". Path 'columns[10]' line 13, position 9.
We are using Newtonsoft to handle the JSON and Dapper for the connection.Query to execute a simple query against the xlsx spreadsheet.
What I am finding is that when executing the Query in the OLDB connection the returned string is maxing out only at 255 length. So this looks more like a Dapper / OLDBConnection issue where i need to set the max length higher.
Here is the code for that
// executing the query and retrieving the test data from the specific column
var query = string.Format("select * from [DataSet$] where TestName = '{0}'", testName);
var value = connection.Query(query).Select(x =>
{
var result = new MyObject{ TestName = x.testName };
foreach (var element in x)
{
if (element.Key.Contains(column))
{
result.CustomColumns.Add(element.Value.ToString());
}
}
return result;
}).FirstOrDefault();
Where x is a dynamic data type.
Has anyone come across this before? Is there some hidden character that is preventing this?
Answered in comments within the main question..
Issue was that the OLEDB connection was reading the fist 8 rows and determining the data type of subsequent rows.
The cells data being pulled was a JSON string. The OLEDB connection was reading the string, however when trying to Parse the string to a JObject, parsing was throwing exception.
Further debugging reviled that within the OLEDB connection when reading the row of data that the string was getting truncated at 255 characters. Formatting the columns did not fix then issue, nor did adding OLEDB settings when creating the connection.
What resolved this was updating the System Reg key used to determine how many Rows to read before determining data type/ length. This solution can be found in the comment section of the original Question.
Or here Data truncated after 255 bytes while using Microsoft.Ace.Oledb.12.0 provider
Related
I've been using LinqToExcel to import data from .xlsx files successfully for a while. Recently, however, I was sent a .csv file that I'm unable to read the data of.
Let's say that the file contains the following data:
Col1 Col2 Col3
A B C
D E F
I've created a class for mapping the columns as such:
public class Test
{
[ExcelColumn("Col1")]
public string Col1 { get; set; }
[ExcelColumn("Col2")]
public string Col2 { get; set; }
[ExcelColumn("Col3")]
public string Col3 { get; set; }
}
Then I try to read the data like so:
var test = from c in excel.Worksheet<Test>()
select c;
The query successfully returns two Test-objects, but all property values are null.
I even tried to read the data without class and header:
var test = from c in excel.WorksheetNoHeader()
select c;
In this case, the query also returns two rows, both with three cells/values. But again all of these values are null. What could be the issue here?
I should also note that the file opens and looks perfectly fine in Excel. Furthermore using StreamReader, I'm able to read all of its rows and values.
What type of data is in each of those columns? (string, numeric, ...)
According to Initializing the Microsoft Excel driver
TypeGuessRows
The number of rows to be checked for the data type. The data type is
determined given the maximum number of kinds of data found. If there
is a tie, the data type is determined in the following order: Number,
Currency, Date, Text, Boolean. If data is encountered that does not
match the data type guessed for the column, it is returned as a Null
value. On import, if a column has mixed data types, the entire column
will be cast according to the ImportMixedTypes setting. The default
number of rows to be checked is 8. Values are of type REG_DWORD.
See post Can I specify the data type for a column rather than letting linq-to-excel decide?
The post Setting TypeGuessRows for excel ACE Driver states how to change the value for TypeGuessRows.
When the driver determines that an Excel column contains text data,
the driver selects the data type (string or memo) based on the longest
value that it samples. If the driver does not discover any values
longer than 255 characters in the rows that it samples, it treats the
column as a 255-character string column instead of a memo column.
Therefore, values longer than 255 characters may be truncated. To
import data from a memo column without truncation, you must make sure
that the memo column in at least one of the sampled rows contains a
value longer than 255 characters, or you must increase the number of
rows sampled by the driver to include such a row. You can increase the
number of rows sampled by increasing the value of TypeGuessRows under
the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
registry key.
One more thing we need to keep in mind is that the registry
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
only applies to Excel 97- 2003. For Excel 2007 and higher version,
Excel Open XML (.XLSX extension) actually uses ACE OLE DB provider
rather JET provider. If you want to keep the file extension as .XLSX,
you need to modify the following registry key according to your Excel
version:
Excel 2007: HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access
Connectivity Engine\Engines\Excel\TypeGuessRows Excel 2010:
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity
Engine\Engines\Excel\TypeGuessRows Excel 2013:
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity
Engine\Engines\Excel\TypeGuessRows
Did you try to materialize your query by calling ToList or ToArray at the end?
I tried to recreate your case and had no trouble reading the data from the Excel file using the following code snippet:
var excel = new ExcelQueryFactory(FilePath);
List<Test> tests = (
from c in excel.Worksheet<Test>()
select c
)
.ToList();
It returns two objects with all properties filled properly.
One minor thing, when I added ToList initially, I got the following exception:
The 'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine.'
Which according to what they say in the official docs seems reasonable since I was missing Microsoft Access Database Engine 2010 Distributable on my machine.
I'm writing a script which will pull data from one database to another with some alteration inbetween. MS SQL server is used on both sides, but I'm having some problems where the SELECT query doesn't return the full JSON string.
I've tried to use a lot of different query methods and functions, but it seems like the problems is not in how I query the information, but rather how the script recieves the JSON.
This is the latest query I've tried, nothing fancy, and it retrieves the information.
var fishQuery = "SELECT [Id],[Name],[OrgNumber] FROM Camps ORDER BY [Id] OFFSET 5 ROWS FETCH NEXT 1000 ROWS ONLY FOR JSON PATH";
EDIT: Should mention that this is how I treat the string after.
var campsInfo = fishConnection.Query<string>(fishQuery).First();
List<Camps> Camps = new List<Camps>();
Camps = (JsonConvert.DeserializeObject<List<Camps>>(campsInfo));
However, I get these Errors in Visual Studio:
Newtonsoft.Json.JsonReaderException: 'Unterminated string. Expected delimiter: ". Path '[20].Name', line 1, position 2033.'
If I print it to a file or Console I get
(A lot more valid JSON data before this...) (...) {\"Id\":\"173246A1-8069-437C-8731-05DBE69C784F\",\"Name\":\"SKANSE"
Which shows that I get invalid JSON since the result stops "halfway" through.
I've read some of these :
How do you view ALL text from an ntext or nvarchar(max) in SSMS?
SQL Server - "for json path" statement does not return more than 2984 lines of JSON string
But I'm not keen on making changes too my SQL server, which is in production atm, before I know that I can fix it on my side!
Thanks for any help!
Recently I played around with the new for json auto feature of the Azure SQL database.
When I select a lot of records for example with this query:
Select
Wiki.WikiId
, Wiki.WikiText
, Wiki.Title
, Wiki.CreatedOn
, Tags.TagId
, Tags.TagText
, Tags.CreatedOn
From
Wiki
Left Join
(WikiTag
Inner Join
Tag as Tags on WikiTag.TagId = Tags.TagId) on Wiki.WikiId = WikiTag.WikiId
For Json Auto
and then do a select with the C# SqlDataReader:
var connectionString = ""; // connection string
var sql = ""; // query from above
var chunks = new List<string>();
using (var connection = new SqlConnection(connectionString))
using (var command = connection.CreateCommand()) {
command.CommandText = sql;
connection.Open();
var reader = command.ExecuteReader();
while (reader.Read()) {
chunks.Add(reader.GetString(0)); // Reads in chunks of ~2K Bytes
}
}
var json = string.Concat(chunks);
I get a lot of chunks of data.
Why do we have this limitation? Why don't we get everything in one big chunk?
When I read a nvarchar(max) column, I will get everything in one chunk.
Thanks for an explanation
From Format Query Results as JSON with FOR JSON:
Output of the FOR JSON clause
The result set contains a single column.
A small result set may contain a single row.
A large result set splits the long JSON string across multiple rows.
By default, SQL Server Management Studio (SSMS) concatenates the results into a single row when the output setting is Results to
Grid. The SSMS status bar displays the actual row count.
Other client applications may require code to recombine lengthy results into a single, valid JSON string by concatenating the
contents of multiple rows. For an example of this code in a C#
application, see Use FOR JSON output in a C# client app.
I would say it is strictly for performance reasons, similiar to XML. More SELECT FOR XML AUTO and return datatypes and What does server side FOR XML return?
In SQL Server 2000 the server side XML publishing - FOR XML (see http://msdn2.microsoft.com/en-us/library/ms178107(SQL.90).aspx) - was implemented in the layer of code between the query processor and the data transport layer. Without FOR XML a SELECT query is executed by the query processor and the resulting rowset is sent to the client side by the server side TDS code. When a SELECT statement contains FOR XML the query processor produces the result the same way as without FOR XML and then FOR XML code formats the rowset as XML. For maximum XML publishing performance FOR XML does steaming XML formatting of the resulting rowset and directly sends its output to the server side TDS code in small chunks without buffering whole XML in the server space. The chunk size is 2033 UCS-2 characters. Thus, XML larger than 2033 UCS-2 characters is sent to the client side in multiple rows each containing a chunk of the XML. SQL Server uses a predefined column name for this rowset with one column of type NTEXT - “XML_F52E2B61-18A1-11d1-B105-00805F49916B” – to indicate chunked XML rowset in UTF-16 encoding. This requires special handling of the XML chunk rowset by the APIs to expose it as a single XML instance on the client side. In ADO.Net, one needs to use ExecuteXmlReader, and in ADO/OLEDB one should use the ICommandStream interface.
As a workaround in the SQL code (i.e. if you don't want to change your querying code to put the chunks together), I found that wrapping the query in a CTE and then selecting form that gives me the results I expected:
--Note that I query from information_schema to just get a lot of data to replicate the problem.
--doing this query results in multiple rows (chunks) returned
SELECT * FROM information_schema.columns FOR JSON PATH, include_null_values
--doing this query results in a single row returned
;WITH SomeCTE(JsonDataColumn) AS
(
SELECT * FROM information_schema.columns FOR JSON PATH, INCLUDE_NULL_VALUES
)
SELECT JsonDataColumn FROM SomeCTE
The first query reproduces the problem for me (returns multiple rows, each a chunk of the total data), the second query gives one row with all the data. SSMS wasn't good for reproducing the issue, you have to try it out with other client code.
I have a table with Column data type "timestamp" in sql server.
Currently I am trying to get the data from this table to sqlite database. as it needs only string value. So far i have not been able to find correct way to convert to string.
So for example my SQL Value is 0x0000000000012DE0
When I get the record using entity framework, I get byte array.
Tried to convert using following code to string.
value = BitConverter.ToInt64(Version, 0);
However for same record, i get 0xE02D010000000000
This is one difference.
The second, Since I am working on azure mobile app, and this data also goes to android via WebAPI controller.The result I get from fiddler is something in this format
AAAAAAABM8s=
I want to also convert the byte arrray value in above format .
Any suggestions?
I had a similar issue with my ASP.NET Core, EF Core 2.0, Angular 2 app. This was a database first development and changing the database definition was not within my remit.
In my case, EF Core automatically performed the concurrency check for me where the Timestamp column was present in the table. This was not an issue for updates because I could pass the DTO in the body but with deletes I could not.
To pass the timestamp I used a query string. The value being the Base64 string representation of the timestamp e.g. AAAAAAACIvw=
In my repository layer I convert the string back to a byte array e.g.
byte[] ts = Convert.FromBase64String(timestampAsBase64String);
Then delete using the create and attach pattern (thereby eliminating any chance of lost updates)
ModelClass modelobj = new ModelClass { Id = id, Timestamp = ts};
_dbcontext.Entry(modelObj).State = EntityState.Deleted;
Thanks to this thread and the CodeProject article Converting Hexadecimal String to/from Byte Array in C# to get this resolved.
I have been working with excel spreadsheets and so far I never had any problems with them.. But this error,"Not a legal OleAut date.", showed up out of the blue when I tried to read an excel file. Does anyone know how I can fix this. Here is the code I use to read the excel and put the data into a dataset. It has worked fine previously but after I made some changes (which doesn't involve dates) to the data source this error showed up.
var fileName = string.Format("C:\\Drafts\\Excel 97-2003 formats\\All Data 09 26 2012_Edited.xls");
var connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0; data source={0}; Extended Properties=Excel 8.0;", fileName);
var adapter = new OleDbDataAdapter("SELECT * FROM [Sheet1$]", connectionString);
DataSet Originalds = new DataSet();
adapter.Fill(Originalds, "Employees"); // this is where the error shows up
I sort of figured out a work around to this problem I changed the connection string to the latest oleDB provider.
var connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties='Excel 12.0 Xml;HDR=YES;'", fileName);
I also made sure that the empty space after the last row of my excel data is not being read as a value.
Look for different date formats, in my case one of the column was in dd/mm/yyyy format and the other was in mm/dd/yyyy. Once the date formats were rectified data import worked.
In my case, I had a date in my table inserted manually with the wrong format, it was 01/01/0019 instead of 01/01/2019.
I have this issue while reading excel in oledb.
I found that in excel many date columns are empty in the beginning records. i had 6000 records in excel and one of the datecolumn has around first 4000 empty cells so excel is not able to understand what to convert the cell so i added one dummy record fill with date and process my file. Or you can move few records which has date value in it at top
How can you fix this? Hi!jack your data with a dummy row at row 1 and force the column(s) in question into a string (in this case only - it is a data type error, so apply the fix according to the type).
It is necessary to understand what the data adaptor does, which interprets the data type of each column by examining, by default, the first 8 rows of data (sans header if HDR=Yes in connect string) and deciding on a data type (it can be over-ridden -yes there is an override - in the connection string to 16 rows - almost never very helpful).
Data adaptors can do other nasty things, like skip strings in columns of mixed data types, like string/double (which is really just a string column, but not to the adaptor if the first rows are all double). It won't even give you the courtesy of an error in this example.
This often occurs in data coming from ERP sources that contains "Free Form" columns. User defined columns are the usual suspects. It can be very difficult to find in other data type issues. I once spent quite a bit of time resolving an issue with a column that typed as a string with a max length of 255 chars. Deep in the data there were cells that exceeded that length and threw errors.
If you don't want to advance to the level of "Genetic Engineering" in working with data adaptors, the fastest way to resolve an issue like this is to hi!jack your data and force the column(s) in question to the correct type (or incorrect, which you can then correct in your own code if need be). Plan B is to give the data back to the customer/user and tell them to correct it. Good luck with Plan B. There is a reason it isn't Plan A.
More on manipulating via the connection string and similar issues with the adaptor - but be wary, results are not going to be 100% fool proof. I've tested changing IMEX and HDR settings extensively. If you want to get through the project quickly, hi!jack the data. OleDB & mixed Excel datatypes : missing data
Here is another posting similar in context, note all of the possible time consuming solutions. I have yet to be convinced there is a better solution, and it simply defies the logic a programmer brings to the keyboard every morning. Too bad, you have a job to do, sometimes you have to be a hack. DateTime format mismatch on importing from Excel Sheet