I have data in sets like this:
**SET 1:**
Time = 2017-11-01 13:18:10
Param1 = 42.42
Param2 = 47.11
Param3 = 12.34
.... (up to 100 parameters)
**SET 2:**
Time = 2017-11-01 13:18:20
Param1 = 45.17
Param2 = 46.11
Param3 = 12.35
.... (up to 100 parameters)
I get a new set of data every 10 second. I need so save data in SQL Server (I am free to define the table).
Later I need to fetch the data from the database so that I can XY graph having time on X-axis and the params on Y-axis.
I was thinking of saving my data as JSON. Either just as a string in a table (where the string is JSON) or use the JSON support in SQL Server 2016.
What is the recommended way of doing this?
I am thinking a lot on performance.
I did some tests:
Simple String: This is just for reference. My data colum simply contains a string with a 5 digit number.
XML String With attributes: Data column is a string (nvarchar(MAX)) containing XML with 35 nodes like this:
<data>
<Regtime value='2017 - 08 - 21 13:56:05'/>
<MachineId value = 'Somefactory.SomeSite.DeviceId' />
<Values>
<B_T_SP value = '181.23' unit = '1234' />
<B_H_SP_Tdp value = '87.34' unit = '801' />
<B_A_SP_v_air value = '42.42' unit = '500' />
<S_T_SP value = '175' unit = '801' />
<S_A_SP_v_air value = '57.23' unit = '500'
...
XML String with nodes: Same as above but not using attributes:
<data>
<Regtime>'2017-11-01T12:59:02.2792518+01:00'</Regtime>
<MachineId>'Somefactory.SomeSite.DeviceId'</MachineId>
<Values>
<B_T_SP>
<value>666,50</value>
<unit>801</unit>
</B_T_SP>
<B_H_SP_Tdp>
<value>414,21</value>
<unit>801</unit>
</B_H_SP_Tdp>
<B_A_SP_v_air>
<value>41,83</value>
<unit>801</unit>
</B_A_SP_v_air>
<S_T_SP>
<value>20,70</value>
<unit>801</unit>
</S_T_SP>
...
JSON string: Data column is a string (nvarchar(MAX)) containing JSON with 35 nodes like this:
{
"data": {
"Regtime": "2017-11-02T12:57:00.3745960+01:00",
"MachineId": "Somefactory.SomeSite.DeviceId",
"Values": {
"B_T_SP": {
"value": "703,81",
"unit": "801"
},
"B_H_SP_Tdp": {
"value": "485,90",
"unit": "801"
},
"B_A_SP_v_air": {
"value": "3,65",
"unit": "801"
},
"S_T_SP": {
"value": "130,44",
"unit": "801"
},
...
Distributed: Like CodeCaster suggested, using two tables. 35 SetParameters per Set
When inserting data I do like this:
startTime = DateTime.Now;
using (ConnectionScope cs = new ConnectionScope())
{
for (int i = 0; i < counts; i++)
{
sql = GetSqlAddData(DataType.XmlAttributeString);
using (IDbCommand c = cs.CreateCommand(sql))
{
c.ExecuteScalar();
}
}
}
logText.AppendText(string.Format("{0}x XML attribute string Insert took \t{1}\r\n", counts, DateTime.Now.Subtract(startTime)));
Except for Distribted, here I do like this:
using (ConnectionScope cs = new ConnectionScope())
{
for (int i = 0; i < counts; i++)
{
sql = GetSqlAddData(DataType.Distributed);
using (IDbCommand c = cs.CreateCommand(sql))
{
id = (int)c.ExecuteScalar();
}
for (int j = 0; j < 35; j++)
{
using (IDbCommand c = cs.CreateCommand($"INSERT into test_datalog_distr_detail (setid, name, value) VALUES ({id}, '{"param"+j}', '{j*100 + i}')"))
{
c.ExecuteScalar();
}
}
}
}
logText.AppendText(string.Format("{0}x Distributed Insert \t{1}\r\n", counts, DateTime.Now.Subtract(startTime)));
When I read data I do like this:
Simple String:
var data = new List<Tuple<DateTime, string>>();
DateTime time;
string point;
string sql = GetSqlGetData(DataType.SimpleString);
var startTime = DateTime.Now;
using (ConnectionScope cs = new ConnectionScope())
{
using (IDbCommand cmd = cs.CreateCommand(sql))
{
using (IDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
time = DateTime.Parse(reader[5].ToString());
point = reader[12].ToString();
data.Add(new Tuple<DateTime, string>(time, point));
}
}
}
}
logText.AppendText(string.Format("{0}x Simple Select {1}\r\n", counts, DateTime.Now.Subtract(startTime)));
XML both with and without nodes:
sql = GetSqlGetData(DataType.XmlAttributeString);
startTime = DateTime.Now;
var doc = new XmlDocument();
using (ConnectionScope cs = new ConnectionScope())
{
using (IDbCommand cmd = cs.CreateCommand(sql))
{
using (IDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
time = DateTime.Parse(reader[5].ToString());
doc = new XmlDocument();
doc.LoadXml(reader[12].ToString());
point = doc.SelectSingleNode("/data/Values/B_T_SP").Attributes["value"].Value;
data.Add(new Tuple<DateTime, string>(time, point));
}
}
}
}
logText.AppendText(string.Format("{0}x Select using XmlDoc and Attribute String {1}\r\n", counts, DateTime.Now.Subtract(startTime)));
JSON:
JObject jobj;
using (ConnectionScope cs = new ConnectionScope())
{
using (IDbCommand cmd = cs.CreateCommand(sql))
{
using (IDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
time = DateTime.Parse(reader[5].ToString());
jobj = JObject.Parse(reader[12].ToString());
point = jobj["data"]["Values"]["B_T_SP"].ToString();
data.Add(new Tuple<DateTime, string>(time, point));
}
}
}
}
logText.AppendText(string.Format("{0}x Select using JSON String {1}\r\n", counts, DateTime.Now.Subtract(startTime)));
Distributed tables (CodeCaster recommendation)
sql = GetSqlGetData(DataType.Distributed);
startTime = DateTime.Now;
using (ConnectionScope cs = new ConnectionScope())
{
using (IDbCommand cmd = cs.CreateCommand(sql))
{
using (IDataReader reader = cmd.ExecuteReader())
{
while (reader.Read())
{
time = DateTime.Parse(reader[5].ToString());
point = reader[15].ToString();
data.Add(new Tuple<DateTime, string>(time, point));
}
}
}
}
logText.AppendText(string.Format("{0}x Select on distributed tables {1}\r\n", counts, DateTime.Now.Subtract(startTime)));
I did a test run where I was inserting 100000 rows to the database and measured the time used:
Simple string value: 18 seconds
XML string (attributes): 36 seconds
XML string (nodes only): 38 seconds
JSON string: 37 seconds
Distributed (CodeCaster): 8 MINUTES!
Reading 100000 rows and fetch one value in each:
Simple string value: 0.4 seconds
XML string (attributes): 5.8 seconds
XML string (nodes only): 7.4 seconds
JSON string: 9.4 seconds
Distributed (CodeCaster): 0.5 seconds
So far my conclusion is:
I am surprised that XML seems faster than JSON. I expected distributed select to be faster than XML especially because the selection of one parameter is done in SQL and not afterwards as with JSON ans XML. But insert into distributed tables worries me. What I need more to test is to have the XML and XML in DB and not string so that I can select which parameter to use in SQL and not afterwards in XmlDocument
Let's not store JSON in a database column, lest you want to create the inner platform effect or a database-in-a-database.
Especially not if you want to meaningfully query the data stored therein. Sure, there are database systems that support this, but if you have the ability to inspect and transform the JSON on beforehand, you should definitely go for that option.
Simply normalize the params into a junction table called SetParameters, with a foreign key to the Sets table.
So you'll end up with two tables:
Sets
Id
Time
SetParameters
Id
SetId
Name
Value
Just wanted to inform what I ended up doing:
JSON simply used up too much database space so I used a three level database structure, where I saved name of parameter in one table and parameter value in another table to reduce the footprint.
I then used Table Valued Parameters (TVP) to insert data in an effective way.
Related
I have a c# mvc app using Dapper. There is a list table page which has several optional filters (as well as paging). A user can select (or not) any of several (about 8 right now but could grow) filters, each with a drop down for a from value and to value. So, for example, a user could select category "price" and filter from value "$100" to value "$200". However, I don't know how many categories the user is filtering on before hand and not all of the filter categories are the same type (some int, some decimal/double, some DateTime, though they all come in as string on FilterRange).
I'm trying to build a (relatively) simple yet sustainable Dapper query for this. So far I have this:
public List<PropertySale> GetSales(List<FilterRange> filterRanges, int skip = 0, int take = 0)
{
var skipTake = " order by 1 ASC OFFSET #skip ROWS";
if (take > 0)
skipTake += " FETCH NEXT #take";
var ranges = " WHERE 1 = 1 ";
for(var i = 0; i < filterRanges.Count; i++)
{
ranges += " AND #filterRanges[i].columnName BETWEEN #filterRanges[i].fromValue AND #filterRanges[i].toValue ";
}
using (var conn = OpenConnection())
{
string query = #"Select * from Sales "
+ ranges
+ skipTake;
return conn.Query<Sale>(query, new { filterRanges, skip, take }).AsList();
}
}
I Keep getting an error saying "... filterRanges cannot be used as a parameter value"
Is it possible to even do this in Dapper? All of the IEnumerable examples I see are where in _ which doesn't fit this situation. Any help is appreciated.
You can use DynamicParameters class for generic fields.
Dictionary<string, object> Filters = new Dictionary<string, object>();
Filters.Add("UserName", "admin");
Filters.Add("Email", "admin#admin.com");
var builder = new SqlBuilder();
var select = builder.AddTemplate("select * from SomeTable /**where**/");
var parameter = new DynamicParameters();
foreach (var filter in Filters)
{
parameter.Add(filter.Key, filter.Value);
builder.Where($"{filter.Key} = #{filter.Key}");
}
var searchResult = appCon.Query<ApplicationUser>(select.RawSql, parameter);
You can use a list of dynamic column values but you cannot do this also for the column name other than using string format which can cause a SQL injection.
You have to validate the column names from the list in order to be sure that they really exist before using them in a SQL query.
This is how you can use the list of filterRanges dynamically :
const string sqlTemplate = "SELECT /**select**/ FROM Sale /**where**/ /**orderby**/";
var sqlBuilder = new SqlBuilder();
var template = sqlBuilder.AddTemplate(sqlTemplate);
sqlBuilder.Select("*");
for (var i = 0; i < filterRanges.Count; i++)
{
sqlBuilder.Where($"{filterRanges[i].ColumnName} = #columnValue", new { columnValue = filterRanges[i].FromValue });
}
using (var conn = OpenConnection())
{
return conn.Query<Sale>(template.RawSql, template.Parameters).AsList();
}
You can easily create that dynamic condition using DapperQueryBuilder:
using (var conn = OpenConnection())
{
var query = conn.QueryBuilder($#"
SELECT *
FROM Sales
/**where**/
order by 1 ASC
OFFSET {skip} ROWS FETCH NEXT {take}
");
foreach (var filter in filterRanges)
query.Where($#"{filter.ColumnName:raw} BETWEEN
{filter.FromValue.Value} AND {filter.ToValue.Value}");
return conn.Query<Sale>(query, new { filterRanges, skip, take }).AsList();
}
Or without the magic word /**where**/:
using (var conn = OpenConnection())
{
var query = conn.QueryBuilder($#"
SELECT *
FROM Sales
WHERE 1=1
");
foreach (var filter in filterRanges)
query.Append($#"{filter.ColumnName:raw} BETWEEN
{filter.FromValue.Value} AND {filter.ToValue.Value}");
query.Append($"order by 1 ASC OFFSET {skip} ROWS FETCH NEXT {take}");
return conn.Query<Sale>(query, new { filterRanges, skip, take }).AsList();
}
The output is fully parametrized SQL, even though it looks like we're doing plain string concatenation.
Disclaimer: I'm one of the authors of this library
I was able to find a solution for this. The key was to convert the List to a Dictionary. I created a private method:
private Dictionary<string, object> CreateParametersDictionary(List<FilterRange> filters, int skip = 0, int take = 0)
{
var dict = new Dictionary<string, object>()
{
{ "#skip", skip },
{ "#take", take },
};
for (var i = 0; i < filters.Count; i++)
{
dict.Add($"column_{i}", filters[i].Filter.Description);
// some logic here which determines how you parse
// I used a switch, not shown here for brevity
dict.Add($"#fromVal_{i}", int.Parse(filters[i].FromValue.Value));
dict.Add($"#toVal_{i}", int.Parse(filters[i].ToValue.Value));
}
return dict;
}
Then to build my query,
var ranges = " WHERE 1 = 1 ";
for(var i = 0; i < filterRanges.Count; i++)
ranges += $" AND {filter[$"column_{i}"]} BETWEEN #fromVal_{i} AND #toVal_{i} ";
Special note: Be very careful here as the column name is not a parameter and you could open your self up to injection attacks (as #Popa noted in his answer). In my case those values come from an enum class and not from user in put so I am safe.
The rest is pretty straight forwared:
using (var conn = OpenConnection())
{
string query = #"Select * from Sales "
+ ranges
+ skipTake;
return conn.Query<Sale>(query, filter).AsList();
}
When using the C# code below to construct a DB2 SQL query the result set only has one row. If I manually construct the "IN" predicate inside the cmdTxt string using string.Join(",", ids) then all of the expected rows are returned. How can I return all of the expected rows using the db2Parameter object instead of building the query as a long string to be sent to the server?
public object[] GetResults(int[] ids)
{
var cmdTxt = "SELECT DISTINCT ID,COL2,COL3 FROM TABLE WHERE ID IN ( #ids ) ";
var db2Command = _DB2Connection.CreateCommand();
db2Command.CommandText = cmdTxt;
var db2Parameter = db2Command.CreateParameter();
db2Parameter.ArrayLength = ids.Length;
db2Parameter.DB2Type = DB2Type.DynArray;
db2Parameter.ParameterName = "#ids";
db2Parameter.Value = ids;
db2Command.Parameters.Add(db2Parameter);
var results = ExecuteQuery(db2Command);
return results.ToArray();
}
private object[] ExecuteQuery(DB2Command db2Command)
{
_DB2Connection.Open();
var resultList = new ArrayList();
var results = db2Command.ExecuteReader();
while (results.Read())
{
var values = new object[results.FieldCount];
results.GetValues(values);
resultList.Add(values);
}
results.Close();
_DB2Connection.Close();
return resultList.ToArray();
}
You cannot send in an array as a parameter. You would have to do something to build out a list of parameters, one for each of your values.
e.g.: SELECT DISTINCT ID,COL2,COL3 FROM TABLE WHERE ID IN ( #id1, #id2, ... #idN )
And then add the values to your parameter collection:
cmd.Parameters.Add("#id1", DB2Type.Integer).Value = your_val;
Additionally, there are a few things I would do to improve your code:
Use using statements around your DB2 objects. This will automatically dispose of the objects correctly when they go out of scope. If you don't do this, eventually you will run into errors. This should be done on DB2Connection, DB2Command, DB2Transaction, and DB2Reader objects especially.
I would recommend that you wrap queries in a transaction object, even for selects. With DB2 (and my experience is with z/OS mainframe, here... it might be different for AS/400), it writes one "accounting" record (basically the work that DB2 did) for each transaction. If you don't have an explicit transaction, DB2 will create one for you, and automatically commit after every statement, which adds up to a lot of backend records that could be combined.
My personal opinion would also be to create a .NET class to hold the data that you are getting back from the database. That would make it easier to work with using IntelliSense, among other things (because you would be able to auto-complete the property name, and .NET would know the type of the object). Right now, with the array of objects, if your column order or data type changes, it may be difficult to find/debug those usages throughout your code.
I've included a version of your code that I re-wrote that has some of these changes in it:
public List<ReturnClass> GetResults(int[] ids)
{
using (var conn = new DB2Connection())
{
conn.Open();
using (var trans = conn.BeginTransaction(IsolationLevel.ReadCommitted))
using (var cmd = conn.CreateCommand())
{
cmd.Transaction = trans;
var parms = new List<string>();
var idCount = 0;
foreach (var id in ids)
{
var parm = "#id" + idCount++;
parms.Add(parm);
cmd.Parameters.Add(parm, DB2Type.Integer).Value = id;
}
cmd.CommandText = "SELECT DISTINCT ID,COL2,COL3 FROM TABLE WHERE ID IN ( " + string.Join(",", parms) + " ) ";
var resultList = new List<ReturnClass>();
using (var reader = cmd.ExecuteReader())
{
while (reader.Read())
{
var values = new ReturnClass();
values.Id = (int)reader["ID"];
values.Col1 = reader["COL1"].ToString();
values.Col2 = reader["COL2"].ToString();
resultList.Add(values);
}
}
return resultList;
}
}
}
public class ReturnClass
{
public int Id;
public string Col1;
public string Col2;
}
Try changing from:
db2Parameter.DB2Type = DB2Type.DynArray;
to:
db2Parameter.DB2Type = DB2Type.Integer;
This is based on the example given here
I have a C# application which retrieves an SQL result set in the following format:
customer_id date_registered date_last_purchase loyalty_points
1 2017-01-01 2017-05-02 51
2 2017-01-23 2017-06-21 124
...
How can I convert this to a JSON string, such that the first column (customer_id) is a key, and all other subsequent columns are values within a nested-JSON object for each customer ID?
Example:
{
1: {
date_registered: '2017-01-01',
date_last_purchase: '2017-05-02',
loyalty_points: 51,
...
},
2: {
date_registered: '2017-01-23',
date_last_purchase: '2017-06-21',
loyalty_points: 124,
...
},
...
}
Besides date_registered, date_last_purchase, and loyalty_points, there may be other columns in the future so I do not want to refer to these column names specifically. Therefore I have already used the code below to fetch the column names, but am stuck after this.
SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();
var columns = new List<string>();
for (var i = 0; i < sqlDataReader.FieldCount; i++)
{
columns.Add(sqlDataReader.GetName(i));
}
while (sqlDataReader.Read())
{
rows.Add(columns.ToDictionary(column => column, column => sqlDataReader[column]));
}
You could use something like this to convert the data reader to a Dictionary<object, Dictionary<string, object>> and then use Json.NET to convert that to JSON:
var items = new Dictionary<object, Dictionary<string, object>>();
while (sqlDataReader.Read())
{
var item = new Dictionary<string, object>(sqlDataReader.FieldCount - 1);
for (var i = 1; i < sqlDataReader.FieldCount; i++)
{
item[sqlDataReader.GetName(i)] = sqlDataReader.GetValue(i);
}
items[sqlDataReader.GetValue(0)] = item;
}
var json = Newtonsoft.Json.JsonConvert.SerializeObject(items, Newtonsoft.Json.Formatting.Indented);
Update: JSON "names" are always strings, so used object and GetValue for the keys.
I am trying to insert a text file formatted in C Sharp to a Microsoft SQL server. I have 2 tables Transaction and TMatch in which I want to populate the data. 4 attributes each. I have created 2 classes for each. I am aware of how to input data manually into the database through the .Add() and .SaveChanges().
Here is what I have so far:
//Database insertions
TTransaction txn = new TTransaction();
**txn.Amount = 56; //I want a variable used below (AMOUNT) to go into amount.
txn.TRN = "sdfgsdfg";** //(TxnNo) to go into TRN
ScotiaNYAEntities context = new ScotiaNYAEntities();
context.TTransactions.Add(txn);
context.SaveChanges();
Traversing the text file using a while loop.
{
if (line.Contains("AMOUNT:")) //Look where to end for Transaction Text
{
// For Amount
IsAmount=true;
if(IsAmount)
{
Amount = line.Replace("AMOUNT:", String.Empty).Trim();
Console.WriteLine("AMOUNT: ********");
Console.WriteLine(Amount);
}
}..............................................
I am not sure how to reference a variable instead of just values.
Thank you.
leap of faith but you could have something like this
using (ScotiaNYAEntities context = new ScotiaNYAEntities())
{
foreach (string line in File.ReadLines(pathToFile))
{
if (line.Contains("AMOUNT:"))
{
if (IsAmount)
{
string amount = line.Replace("AMOUNT:", string.Empty).Trim();
TTransaction txn = new TTransaction();
txn.Amount = amount;
txn.TRN = "sdfgsdfg";
context.TTransactions.Add(txn);
}
}
}
context.SaveChanges();
}
This is what I did:
In the for loop for reading the file line by line
String TxnLOC = null;
IsTransactionLocation= false;
if (line.Contains("TRANSACTION LOC:"))
{
IsTransactionLocation = true;
if (IsTransactionLocation)
{
TxnLOC = line.Replace("TRANSACTION LOC:", String.Empty).Trim();
Console.WriteLine("The Transaction Location: ********");
Console.WriteLine(TxnLOC);
//Database insertion fot TTransaction Table
TTransaction txn = new TTransaction();
txn.TRN = txnNo;
txn.Amount = Convert.ToDecimal(Amount);
txn.TransactionLocation = TxnLOC;
context.TTransaction.Add(txn); //Adding to the database
context.SaveChanges();
IsTxnSection = false;//For 1 to many relationship
}
}
The following code pulls data from two tables table1 and table2, performs a JOIN on them, over field 3 and indexes it into Elasticsearch. The total number or rows which need indexing are around 500 million. The code inserts 5 million records in one hour, so this way it will take 100 hours to complete. Is there any way I can make it faster?
public static void selection()
{
Uri node = new Uri("http://localhost:9200");
ConnectionSettings settings = new ConnectionSettings(node);
ElasticClient client = new ElasticClient(settings);
int batchsize = 100;
string query = "select table1.field1, table2.field2 from table1 JOIN table2 ON table1.field3=table2.field3";
try
{
OracleCommand command = new OracleCommand(query, con);
OracleDataReader reader = command.ExecuteReader();
List<Record> l = new List<Record>(batchsize);
string[] str = new string[2];
int currentRow = 0;
while (reader.Read())
{
for (int i = 0; i < 2; i++)
str[i] = reader[i].ToString();
l.Add(new Record(str[0], str[1]));
if (++currentRow == batchsize)
{
Commit(l, client);
l.Clear();
currentRow = 0;
}
}
Commit(l, client);
}
catch(Exception er)
{
Console.WriteLine(er.Message);
}
}
public static void Commit(List<Record> l, ElasticClient client)
{
BulkDescriptor a = new BulkDescriptor();
foreach (var x in l)
a.Index<Record>(op => op.Object(x).Index("index").Type("type"));
var res = client.Bulk(d => a);
Console.WriteLine("100 records more inserted.");
}
Any help is appreciated! :)
Can you try using lower level client i.e. ElasticSearchClient ?
Here is sample example -
//Fill data in ElasticDataRows
StringBuilder ElasticDataRows = new StringBuilder()
ElasticDataRows.AppendLine("{ \"index\": { \"_index\": \"testindex\", \"_type\": \"Accounts\" }}");
ElasticDataRows.AppendLine(JsonConvert.SerializeXmlNode(objXML, Newtonsoft.Json.Formatting.None, true));
var node = new Uri(objExtSetting.SelectSingleNode("settings/ElasticSearchURL").InnerText);
var config = new ConnectionConfiguration(node);
ElasticsearchClient objElasticClient = new ElasticsearchClient(config);
//Insert data to ElasticSearch
var response = ExtractionContext.objElasticClient.Bulk(Message.ElasticDataRows.ToString());
ElasticSearchClient is not strongly typed like NEST. So you can convert your Class object data to JSON using NewtonSoft.JSON.
As per my testing this is more faster than NEST API.
Thanks,
Sameer
We have like 40-50 databases that we reindex each month. Each DB has from 1 to 8 mil rows. The difference is that i take the data from MongoDB. What i'm doing to make it faster is to use Parallel.Foreach with 32 threads running and inserting into elastic.I just insert one record because i need to calculate stuff for each of them but you just take them from the DB and insert them in elastic so the bulk insert seems better. You could try to use 3-4 threads and that bulk insert.
So split your table into 4 then start different threads that bulk insert into elastic. From what i have seen i'm pretty sure that the part when you read from DB is taking the biggest part of the time. Also i think you should try to use a batch > 100.