I'm working on some middleware in C# using the System.Data.Odbc library to interact with a v10 PSQL database. I have a set of working select and insert queries I run in sequence where occasionally the full sequence will execute without issue but most of the time for each INSERT query in the sequence my error handling catches the exception:
ERROR [HY000][Pervasive][ODBC Client Interface][LNA][Pervasive][ODBC Engine Interface][Date Record Manager]Field length is > maximum
I'm trying to understand what this means and how to resolve it.
This is on a Windows 2008 R2 server. I'm using C# in Visual Studios Community 2015 to take information from a web system (no issues here) and add sales orders to the other system sitting on the server which uses a Pervasive SQL v10 database.
The PSQL tables are massive with 80-160 columns, so for the 3 tables I need to write to I run a select query first to fetch the excess values, then bind them as parameters for the insert query. There are 4 SELECT/INSERT routines run in sequence, with the last one running n times in a foreach loop.
I've been able to run this ODBC SELECT/INSERT sequence on this system in the past using MS Access and PHP. I've tried cleaning the solution, rebooting the server and rebuilding, as well as adding additional Dispose() calls on the commands, but I still get these errors.
class PSQLOrderConnector
{
private OdbcConnection Odbc { get; }
public PSQLOrderConnector()
{
Odbc = new OdbcConnection(Constants.ODBCSTRING);
Odbc.Open();
}
/*
...
*/
public void CreateOrderAddressBillTo(string CustomerCode, string OrderNumber, string AddDate, int AddTime)
{
OdbcCommand cmdSelectAddressB = new OdbcCommand();
OdbcCommand cmdInsertAddressB = new OdbcCommand();
cmdSelectAddressB = this.Odbc.CreateCommand();
cmdSelectAddressB.CommandText = OrderQueries.PQSL_SELECT_ADDRESS_BY_CUSTOMER;
cmdSelectAddressB.Parameters.Add("#CEV_NO", OdbcType.Text).Value = CustomerCode;
OdbcDataReader reader = cmdSelectAddressB.ExecuteReader();
reader.Read();
var NAME = reader.GetString(0);
/* ...repeat for the next 80 columns */
cmdSelectAddressB.Dispose();
cmdInsertAddressB = this.Odbc.CreateCommand();
cmdInsertAddressB.CommandText = OrderQueries.PSQL_INSERT_ORDER_ADDRESS;
cmdInsertAddressB.Parameters.Add("NAME", OdbcType.Text).Value = NAME;
/* ...repeat for the next 80 variables */
try
{
int result = cmdInsertAddressB.ExecuteNonQuery();
}
catch (OdbcException odbce)
{
//Exception error gets thrown here
}
cmdInsertAddressB.Dispose();
}
/*
...
*/
}
class Order
{
private PSQLOrderConnector PSQLOrder { get; }
public Order()
{
PSQLOrder = new PSQLOrderConnector();
}
/*
...
*/
public void AddOrders(List<businessEvent> Events )
{
/*
...
*/
/* These 4 calls either pass in sequence or fail in sequence on the Try/Catch in the above class*/
PSQLOrder.CreateOrderHeader(OrderNumber, CustomerCode, PONumber, SubTotal, CurrentCost, AverageCost, AddDate, AddTime);
/* This is the method detailed above */
PSQLOrder.CreateOrderAddressBillTo(CustomerCode, OrderNumber, AddDate, AddTime);
PSQLOrder.CreateOrderAddressShipTo(CustomerCode, ShipToCode, OrderNumber, AddDate, AddTime);
int recNo = 1;
foreach (ItemLine line in itemLines)
{
PSQLOrder.CreateOrderDetail( OrderNumber, recNo, line.ItemCode, line.Quantity, line.Price, AddDate, AddTime);
recNo++;
}
}
}
(I edited the code for cleaner posting here, hopefully there's no typo's)
With the last lines running the function calls either the error triggers for each insert in sequence, or the entire sequence completes successfully. Using the same or different inputs this occurs randomly with roughly a 80/20 failure/success rate.
I've resolved my issue, the error message was exactly right and was pointing to my timestamp. Because I set it once with a variable AddTime then sent that to each of the 4 function calls is why it would fail in sequence but sometimes work, probably in the first 9 seconds of the system minute.
Related
We are having trouble inserting into a filetable. This is our current constellation, I will try to explain it as detailed as possible:
Basically we have three tables:
T_Document (main metadata of a document)
T_Version (versioned metadata of a document)
T_Content (the binary content of a document, FileTable)
Our WCF service saves the documents and is being used by multiple persons. The service will open a transaction and will call the method SaveDocument which saves the documents:
//This is the point where the tranaction starts
using (IDBTransaction tran = db.GetTransaction())
{
try
{
m_commandFassade.SaveDocument(document, m_loginName, db, options, lastVersion);
tran.Commit();
return document;
}
catch
{
tran.Rollback();
throw;
}
}
The SaveDocument method looks like this:
public void SaveDocument(E2TDocument document, string login, IDBConnection db, DocumentUploadOptions options, int lastVersion)
{
document.GuardNotNull();
options.GuardNotNull();
if (lastVersion == -1)
{
//inserting new T_Document
SaveDocument(document, db);
}
else
{
//updating the existing T_Document
UpdateDocument(document, db); //document already exists, updating it
}
Guid contentID = Guid.NewGuid();
//inserting the content
SaveDocumentContent(document, contentID, db);
//inserting the new / initial version
SaveDocumentVersion(document, contentID, db);
}
Basically all the methods you see are either inserting or updating those three tables. The insert content query, that appears to make some trouble like this:
INSERT INTO T_Content
(stream_id
,file_stream
,name)
VALUES
(#ContentID
,#Content
,#Title)
And the method (please take this as pseudo code):
private void SaveDocumentContent(E2TDocument e2TDokument, Guid contentID, IDBConnection db)
{
using (m_log.CreateScope<MethodScope>(GlobalDefinitions.TracePriorityForData))
{
Command cmd = CommandFactory.CreateCommand("InsertContents");
cmd.SetParameter("ContentID", contentID);
cmd.SetParameter("Content", e2TDokument.Content);
string title = string.Concat(e2TDokument.Titel.RemoveIllegalPathCharacters(), GlobalDefinitions.UNTERSTRICH,
contentID).TrimToMaxLength(MaxLength_T_Contents_Col_Name, SuffixLength_T_Contents_Col_Name);
cmd.SetParameter("Title", title);
db.Execute(cmd);
}
}
I have no experience in deadlock-analysis, but the deadlock graphs show me that when inserting the content into the filetable, it appears to be deadlocked with another process also writing into the same table at the same time.
(the other side shows the same statement, my application log confirms two concurrent tries to save documents)
The same deadlock appears 30 times a day. I already shrinked the transaction to a minimum, removing all unneccessary selects, but yet I had no luck to solve this issue.
What I'm most curious about is how its possible to deadlock on an insert into a filetable. Are there internal things that are being executed that I'm not aware of. I saw some strange statements in the profiler trace on that table, that we dont use anywhere in the code, e.g.:
set #pathlocator = convert(hierachyid, #path_locator__bin)
and things like:
if exists (
select 1
from [LGOL_Content01].[dbo].[T_Contents]
where parent_path_locator = #path_locator
)
If you need any more details, please let me know. Any tips how to proceed would be awesome.
Edit 1:\
Following you find the execution plan for the T_Content insert:
So, after hours and hours and research and consulting with microsoft, the deadlock is actually a filetable / sql server related bug, which will be hotfixed by Microsoft.
EDIT It seems that locks (mostly/only?) stay locked when a transactional error occurs. I have to restart the database for it to work again, but it is not actively processing anything (no CPU/RAM/HDD activity).
Environment
I have an ASP.NET application that uses the Neo4jClient NuGet package to talk to a Neo4j database. I have N SimpleNode objects that need to be inserted where N can be anything from 100 to 50.000. There are other objects for the M edges that need to be inserted where M can be from 100 to 1.000.000.
Code
Inserting using normal inserts is too slow with 8.000 nodes taking about 80 seconds with the following code:
Client.Cypher
.Unwind(sublist, "node")
.Merge("(n:Node { label: node.label })")
.OnCreate()
.Set("n = node")
.ExecuteWithoutResults();
Therefore I used the import CSV function with the following code:
using (var sw = new StreamWriter(File.OpenWrite("temp.csv")))
{
sw.Write(SimpleNodeModel.Header + "\n");
foreach (var simpleNodeModel in nodes)
{
sw.Write(simpleNodeModel.ToCSVWithoutID() + "\n");
}
}
var f = new FileInfo("temp.csv");
Client.Cypher
.LoadCsv(new Uri("file://" + f.FullName), "csvNode", true)
.Merge("(n:Node {label:csvNode.label, source:csvNode.source})")
.ExecuteWithoutResults();
While still slow, it is an improvement.
Problem
The problem is that the CSV files are locked by the neo4j client (not C# or any of my own code it seems). I would like to overwrite the temp .CSV files so the disk does not fill up, or delete them after use. This is now impossible as the process locks them and I cannot use them. This also means that running this code twice crashes the program, as it cannot write to file the second time.
The nodes are inserted and do appear normally, so it is not the case that it is still working on them. After some unknown and widely varying amount of time, files do seem to unlock.
Question
How can I stop the neo4j client from locking the files long after use? Why does it lock them for so long? Another question: is there a better way of doing this in C#? I am aware of the java importer but I would like my tool to stay in the asp.net environment. It must be possible to insert 8.000 simple nodes within 2 seconds in C#?
SimpleNode class
public class SimpleNodeModel
{
public long id { get; set; }
public string label { get; set; }
public string source { get; set; } = "";
public override string ToString()
{
return $"label: {label}, source: {source}, id: {id}";
}
public SimpleNodeModel(string label, string source)
{
this.label = label;
this.source = source;
}
public SimpleNodeModel() { }
public static string Header => "label,source";
public string ToCSVWithoutID()
{
return $"{label},{source}";
}
}
When I use prepared statement for async execution of multiple statements I get JSON with broken data. Keys and values got totally corrupted.
First I encoutered this issue when I were performing stress testing of our project using custom script. We are using DataStax C++ driver and execute statements from different fibers.
Then I was trying to isolate problem and wrote simple C# program which starts multiple Tasks in a loop. Each task uses the once created prepared statement to read data from the base. For some rows result is totally mess, eg:
Expected (fetched by cqlsh)
516b00a2-01a7-11e6-8630-c04f49e62c6b |
lucid_lynx_value_45404 |
precise_pangolin_value_618429 |
saucy_salamander_value_302796 |
trusty_tahr_value_873 |
vivid_vervet_value_216045 |
wily_werewolf_value_271991
Actual
{
"sa": "516b00a2-01a7-11e6-8630-c04f49e62c6b",
"lucid_lynx": "wily_werewolflue_45404",
"precise_pangolin": "precise_pangolin_value_618429",
"saucy_salamander": "saucy_salamander_value_302796",
"trusty_tahr": "trusty_tahr_value_873",
"vivid_vervet": "vivid_vervet_value_216045",
"wily_werewolf": "wily_werewolf_value_271991"
}
Here is the main part of C# code.
static void Main(string[] args)
{
const int task_count = 300;
using(var cluster = Cluster.Builder().AddContactPoints(/*contact points here*/).Build())
{
using(var session = cluster.Connect())
{
var prepared = session.Prepare("select json * from test_neptunao.ubuntu where id=?");
var tasks = new Task[task_count];
for(int i = 0; i < task_count; i++)
{
tasks[i] = Query(prepared, session);
}
Task.WaitAll(tasks);
}
}
Console.ReadKey();
}
private static Task Query(PreparedStatement prepared, ISession session)
{
string id = GetIdOfRandomRow();
var stmt = prepared.Bind(id);
stmt.SetConsistencyLevel(ConsistencyLevel.One);
return session.ExecuteAsync(stmt).ContinueWith(tr =>
{
foreach(var row in tr.Result)
{
var value = row.GetValue<string>(0);
//some kind of output
}
});
}
CQL script with test DB schema.
CREATE KEYSPACE IF NOT EXISTS test_neptunao
WITH replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 3
};
use test_neptunao;
create table if not exists ubuntu (
id timeuuid PRIMARY KEY,
precise_pangolin text,
trusty_tahr text,
wily_werewolf text,
vivid_vervet text,
saucy_salamander text,
lucid_lynx text
);
UPD
Expected JSON
{
"id": "516b00a2-01a7-11e6-8630-c04f49e62c6b",
"lucid_lynx": "lucid_lynx_value_45404",
"precise_pangolin": "precise_pangolin_value_618429",
"saucy_salamander": "saucy_salamander_value_302796",
"trusty_tahr": "trusty_tahr_value_873",
"vivid_vervet": "vivid_vervet_value_216045",
"wily_werewolf": "wily_werewolf_value_271991"
}
UPD2
Here is the sample c# project mentioned above
UPD3
The issue was resolved after upgrading to Cassandra 3.5.
It sounds like you're seeing CASSANDRA-11048 (JSON Queries are not thread safe). Upgrading Cassandra to a version with the fix is the best way to resolve this.
The only error I see in the generated JSON is the name of the primary key which should be "id" instead of "sa". Otherwise the other columns are correct.
{
"sa": "516b00a2-01a7-11e6-8630-c04f49e62c6b",
"lucid_lynx": "wily_werewolflue_45404",
"precise_pangolin": "precise_pangolin_value_618429",
"saucy_salamander": "saucy_salamander_value_302796",
"trusty_tahr": "trusty_tahr_value_873",
"vivid_vervet": "vivid_vervet_value_216045",
"wily_werewolf": "wily_werewolf_value_271991"
}
What kind of JSON structure did you expect to get as result ?
I have inherited a WCF web service application that requires to have much better error tracking. What we do is query data from one system (AcuODBC), and send that data to another system (Salesforce). This query will return 10's of thousands of complex objects as a List<T>. We then process this List<T> in batches of 200 records at a time to map the fields to another object type, then send that batch to Salesforce. After this is completed, the next batch starts. Here's a brief example:
int intStart = 0, intEnd = 200;
//done in a loop, snipped for brevity
var leases = from i in trleases.GetAllLeases(branch).Skip(intStart).Take(intEnd)
select new sforceObject.SFDC_Lease() {
LeaseNumber = i.LeaseNumber.ToString(),
AccountNumber = i.LeaseCustomer,
Branch = i.Branch
(...)//about 150 properties
//do stuff with list and increment to next batch
intStart += 200;
However, the problem is if one object has a bad field mapping (Invalid Cast Exception), I would like to print out the object that failed to a log.
Question
Is there any way I can decipher which object of the 200 threw the exception? I could forgo the batch concept that was given to me, but I'd rather avoid that if possible for performance reasons.
This should accomplish what you are looking for with very minor code changes:
int intStart = 0, intEnd = 200, count = 0;
List<SDFC_Lease> leases = new List<SDFC_Lease>();
//done in a loop, snipped for brevity
foreach(var i in trleases.GetAllLeases(branch).Skip(intStart).Take(intEnd)) {
try {
count++;
leases.Add(new sforceObject.SFDC_Lease() {
LeaseNumber = i.LeaseNumber.ToString(),
AccountNumber = i.LeaseCustomer,
Branch = i.Branch
(...)//about 150 properties);
} catch (Exception ex) {
// you now have you culprit either as 'i' or from the index 'count'
}
}
//do stuff with 'leases' and increment to next batch
intStart += 200;
I think that you could use a flag in each set method of the properties of the class SFDC_Lease, and use a static property for this like:
public class SFDC_Lease
{
public static string LastPropertySetted;
public string LeaseNumber
{
get;
set
{
LastPropertySetted = "LeaseNumber";
LeaseNumber = value;
}
}
}
Plz, feel free to improve this design.
We need to index (in ASP.NET) all our records stored in a SQL Server table. That table has around 2M records with text (nvarchar) data too in each row.
Is it okay to fetch all records in one go as we need to index them (for search)? What is the other option (I want to avoid pagination)?
Note: I am not displaying these records, just need all of them in one go so that I can index them via a background thread.
Do I need to set any long time outs for my query? If yes, what is the most effective method for setting longer time outs if I am running the query from ASP.NET page?
If I needed something like this, just thinking about it from the database side, I'd probably export it to a file. Then that file can get moved around pretty easily. Moving around data sets that large is a huge pain to all involved. You can use SSIS, sqlcmd or even bcp in a batch command to get it done.
Then, you just have to worry about what you're doing with it on the app side, no worries about locking & everything on the database side once you've exported it.
I don't think a page is a good place for this regardless. There should be a different process or program that does this. On a related note maybe something like http://incubator.apache.org/lucene.net/ would help you?
Is it okay to fetch all records in one go as we need to index them
(for search)? What is the other option (I want to avoid pagination)?
Memory Management Issue / Performance Issue
You can face System Out Of Memory Exception in case you are bringing 2 millions of records
As you will be keeping all those records in DataSet and the dataset memory will be in RAM.
Do I need to set any long time outs for my query? If yes, what is the
most effective method for setting longer time outs if I am running the
query from ASP.NET page?
using (System.Data.SqlClient.SqlCommand cmd = new System.Data.SqlClient.SqlCommand())
{
cmd.CommandTimeout = 0;
}
Suggestion
It's better to filter out the record from database level...
Fetch all records from database and save it in a file. Access that file for any intermediate operations.
What you describe in Extract Transform Load (ETL). there are 2 options I'm aware of:
SSIS which is part of sql server
Rhino.ETL
I prefer Rhino.Etl as it's comletely written in C#, you can create scripts in Boo and it's much easier to test and compose ETL Processes. And the library is built to handle large sets of data, so memory management is built in.
One final note: while asp.net might be the entry point to start the indexing process, I wouldn't run the process within asp.net as it could take minutes or hours depending on the amount of records and processing.
instead have asp.net be the entry point to fires off a background task to process the records. Ideally, completely independent of asp.net so you avoid any timeout or shutdown issues.
Process your records in batches. You are going to have two main issues. (1) You need to index all of the existing records. (2) you will want to update the index with records that were added, updated or deleted. It might sound eaiser just to drop the index and recreate it, but it should be avoided if possible. Below is an example of processing the [Production].[TransactionHistory] from the AdventureWorks2008R2 database in batches of 10,000 records. It does not load all of the records into memory. Output on my local computer produces Processed 113443 records in 00:00:00.2282294. Obviously, this doesn't take into consideration remote computer and processing time for each record.
class Program
{
private static string ConnectionString
{
get { return ConfigurationManager.ConnectionStrings["db"].ConnectionString; }
}
static void Main(string[] args)
{
int recordCount = 0;
int lastId = -1;
bool done = false;
Stopwatch timer = Stopwatch.StartNew();
do
{
done = true;
IEnumerable<TransactionHistory> transactionDataRecords = GetTransactions(lastId, 10000);
foreach (TransactionHistory transactionHistory in transactionDataRecords)
{
lastId = transactionHistory.TransactionId;
done = false;
recordCount++;
}
} while (!done);
timer.Stop();
Console.WriteLine("Processed {0} records in {1}", recordCount, timer.Elapsed);
}
/// Get a new open connection
private static SqlConnection GetOpenConnection()
{
SqlConnection connection = new SqlConnection(ConnectionString);
connection.Open();
return connection;
}
private static IEnumerable<TransactionHistory> GetTransactions(int lastTransactionId, int count)
{
const string sql = "SELECT TOP(#count) [TransactionID],[TransactionDate],[TransactionType] FROM [Production].[TransactionHistory] WHERE [TransactionID] > #LastTransactionId ORDER BY [TransactionID]";
return GetData<TransactionHistory>((connection) =>
{
SqlCommand command = new SqlCommand(sql, connection);
command.Parameters.AddWithValue("#count", count);
command.Parameters.AddWithValue("#LastTransactionId", lastTransactionId);
return command;
}, DataRecordToTransactionHistory);
}
// funtion to convert a data record to the TransactionHistory object
private static TransactionHistory DataRecordToTransactionHistory(IDataRecord record)
{
TransactionHistory transactionHistory = new TransactionHistory();
transactionHistory.TransactionId = record.GetInt32(0);
transactionHistory.TransactionDate = record.GetDateTime(1);
transactionHistory.TransactionType = record.GetString(2);
return transactionHistory;
}
private static IEnumerable<T> GetData<T>(Func<SqlConnection, SqlCommand> commandBuilder, Func<IDataRecord, T> dataFunc)
{
using (SqlConnection connection = GetOpenConnection())
{
using (SqlCommand command = commandBuilder(connection))
{
using (IDataReader reader = command.ExecuteReader())
{
while (reader.Read())
{
T record = dataFunc(reader);
yield return record;
}
}
}
}
}
}
public class TransactionHistory
{
public int TransactionId { get; set; }
public DateTime TransactionDate { get; set; }
public string TransactionType { get; set; }
}