Getting column names dynamically in SSIS script component acting as Destination - c#

Ok, I am writing an avro file using SSIS script component as a destination. Since AVRO needs a schema as well, I need to define the schema. It works fine when I define the schema manually. But I have 10-12 data flow tasks and I do not want to write the schema explicitly. I am trying to see if I can use the BufferWrapper which is auto generated to see if I can read from there but I can't and it always returns blank.
I have tried the solution posted here and also read up this.
But everything returns blank.
I have also come across this. Could that be the reason and if that explanation in the answer posted there is correct, isn't this possible?
So, in my
public override void PreExecute(), I have something like this:
Schema = #"{
""type"":""record"",
""name"":""Microsoft.Hadoop.Avro.Specifications.Counterparts"",
""fields"":
[
{ ""name"":""CounterpartID"", ""type"":""int"" },
{ ""name"":""CounterpartFirstDepositDate"", ""type"":[""string"",""null""] },
{ ""name"":""CounterpartFirstTradeDate"",""type"":[""string"",""null""] },
{ ""name"":""ClientSegmentReportingID"",""type"":[""int"",""null""] },
{ ""name"":""ClientSegmentReportingName"", ""type"":[""string"",""null""] },
{ ""name"":""ContractID"", ""type"":[""int"",""null""] },
{ ""name"":""ContractFirstDepositDate"", ""type"":[""string"",""null""] },
{ ""name"":""ContractFirstTradeDate"",""type"":[""string"",""null""] },
{ ""name"":""ContractClosingOffice"",""type"":[""string"",""null""] },
{ ""name"":""LeadCreationDate"", ""type"":[""string"",""null""] },
{ ""name"":""ContractCountryOfResidence"", ""type"":[""string"",""null""] }
]
}";
}
Instead of manually defining all this schema, I am checking if I could generate it out of the BufferWrapper but this returns blank:
var fields = typeof(Input0Buffer).GetFields().Select(m => new
{
Name = m.Name,
Type = m.FieldType
}).ToList();
Also, if I just do this, that also returns blank
Type myType = typeof(Input0Buffer);
// Get the fields of the specified class.
FieldInfo[] myField = myType.GetFields();
Earlier I was putting these new methods in Pre-Execute but then thought, maybe the buffer isn't initialized by then , so I moved that to Input0_ProcessInputRow method and making sure that it is triggered only once using a counter variable and making this code run only when counter=0, but even that returns blank.
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (counter == 0)
{
Type myType = typeof(Input0Buffer);
// Get the fields of the specified class.
FieldInfo[] myField = myType.GetFields();
}
//Processing logic
}
Isn't it possible because of this?
As it talks about it being protected and also not accessible from outside that autogenerated class.

I have finally found the answer here: https://waheedrous.wordpress.com/2014/02/24/ssis-global-replace-for-all-columns-using-a-script-component/
I can finally get the list of columns and datatypes from within the script component. I will only run it the first time (using a counter variable).

Related

Column names are not loading correctly c# scripting in SSIS

Im new to scripting in C# using SSIS so I'm not sure why this is happening. I'm using a script component to read some files. My column names that have _ "underscore" are not showing correctly
The error code is this.
Severity Code Description Project File Line Suppression State
Error CS1061 'Output0Buffer' does not contain a definition for 'funded_amnt' and no accessible extension method 'funded_amnt' accepting a first argument of type 'Output0Buffer' could be found (are you missing a using directive or an assembly reference?)
When you add columns to a Script task, I don't know the rules off the top of my head but, there is a translation that happens between Data Flow Column Name to the available name within a script task. Given that every column with an underscore in your data flow is reported to be in error, I'm guessing that is one of the valid characters in a column name that it doesn't allow in a Buffer's field name.
If I were to guess, underscores get eliminated as every column in a script has an _IsNull property added to it and perhaps multiple underscores would complicate logic somewhere.
Given a source query of
SELECT 1 AS col1, 2 AS col_underscore_2;
I get two columns in my data flow task named col1 and col_underscore_2. Adding both to a script task
This is the resulting auto-generated definition of my input buffer (BufferWrapper.cs)
public class Input0Buffer: ScriptBuffer
{
public Input0Buffer(PipelineBuffer Buffer, int[] BufferColumnIndexes, OutputNameMap OutputMap)
: base(Buffer, BufferColumnIndexes, OutputMap)
{
}
public Int32 col1
{
get
{
return Buffer.GetInt32(BufferColumnIndexes[0]);
}
}
public bool col1_IsNull
{
get
{
return IsNull(0);
}
}
public Int32 colunderscore2
{
get
{
return Buffer.GetInt32(BufferColumnIndexes[1]);
}
}
public bool colunderscore2_IsNull
{
get
{
return IsNull(1);
}
}
new public bool NextRow()
{
return base.NextRow();
}
new public bool EndOfRowset()
{
return base.EndOfRowset();
}
}
Of note, it mangles col_underscore_2 to an internal column name of colunderscore2. If you want to see the same for your, either double click on BufferWrapper.cs in Solution Explorer or put your cursor on Input0Buffer in the public override void Input0_ProcessInputRow(Input0Buffer Row) and hit F12 (Go to Definition)
If you want to see the autogenerated columns,
I think a similar logic goes on with buffer names.
I assume you're attempting to auto generate the script task and you'll want to take that into consideration. Also, if the LendingClassRow could have a null, you'll need to have logic like
if(LendingClassRow.verification_status != null)
{
Output0Buffer.verificationstatus = LendingClassRow.verification_status
}
// This entire block is not needed as it will be the default if no value is
// assigned but I call it out here in case future readers have a need to
// muck with it
else
{
Output0Buffer.verificationstatus_IsNull = true;
}

How to pass mapped array to PageMethods?

I've been trying to pass a data array to c# web method with using jquery. I have a table which has selectable rows. And my function must pass the id's of selected data. But i can't pass the object array with PageMethods.
Here is my jquery function;
function DeleteQuestions()
{
var table = $('#questTable').DataTable();
var data = (table.rows('.selected').data()).map(function (d) { return d.q_id; });
PageMethods.Delete(data);
}
When i debug it with firebug, veriable data looks like : Object["543","544","546"] as i wanted.
And here is my Web Method:
[WebMethod]
public static void Delete(List<string> questId)
{
DB_UserControl carrier = new DB_UserControl(); //break pointed here
}//and looks it doesn't come here
It doesn't work, and the error is : Cannot serialize object with cyclic reference within child properties. I've searced for error but i couldn't figured it out. So need some help. Thanks in advance.
Note:Error throws at script function's last line: PageMethods.Delete(data);
And i think it might be about mapped data causes some kind of loop behavior.
Problem solved with changing syntax. I use
var data = $.map(table.rows('.selected').data(), function (d) {
return d.q_id;
});
instead of given line. I don't know what caused the error but this code works fine and i get data in c#. Thank you all

Concurrent execution of Cassandra prepared statement returns invalid JSON as result

When I use prepared statement for async execution of multiple statements I get JSON with broken data. Keys and values got totally corrupted.
First I encoutered this issue when I were performing stress testing of our project using custom script. We are using DataStax C++ driver and execute statements from different fibers.
Then I was trying to isolate problem and wrote simple C# program which starts multiple Tasks in a loop. Each task uses the once created prepared statement to read data from the base. For some rows result is totally mess, eg:
Expected (fetched by cqlsh)
516b00a2-01a7-11e6-8630-c04f49e62c6b |
lucid_lynx_value_45404 |
precise_pangolin_value_618429 |
saucy_salamander_value_302796 |
trusty_tahr_value_873 |
vivid_vervet_value_216045 |
wily_werewolf_value_271991
Actual
{
"sa": "516b00a2-01a7-11e6-8630-c04f49e62c6b",
"lucid_lynx": "wily_werewolflue_45404",
"precise_pangolin": "precise_pangolin_value_618429",
"saucy_salamander": "saucy_salamander_value_302796",
"trusty_tahr": "trusty_tahr_value_873",
"vivid_vervet": "vivid_vervet_value_216045",
"wily_werewolf": "wily_werewolf_value_271991"
}
Here is the main part of C# code.
static void Main(string[] args)
{
const int task_count = 300;
using(var cluster = Cluster.Builder().AddContactPoints(/*contact points here*/).Build())
{
using(var session = cluster.Connect())
{
var prepared = session.Prepare("select json * from test_neptunao.ubuntu where id=?");
var tasks = new Task[task_count];
for(int i = 0; i < task_count; i++)
{
tasks[i] = Query(prepared, session);
}
Task.WaitAll(tasks);
}
}
Console.ReadKey();
}
private static Task Query(PreparedStatement prepared, ISession session)
{
string id = GetIdOfRandomRow();
var stmt = prepared.Bind(id);
stmt.SetConsistencyLevel(ConsistencyLevel.One);
return session.ExecuteAsync(stmt).ContinueWith(tr =>
{
foreach(var row in tr.Result)
{
var value = row.GetValue<string>(0);
//some kind of output
}
});
}
CQL script with test DB schema.
CREATE KEYSPACE IF NOT EXISTS test_neptunao
WITH replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 3
};
use test_neptunao;
create table if not exists ubuntu (
id timeuuid PRIMARY KEY,
precise_pangolin text,
trusty_tahr text,
wily_werewolf text,
vivid_vervet text,
saucy_salamander text,
lucid_lynx text
);
UPD
Expected JSON
{
"id": "516b00a2-01a7-11e6-8630-c04f49e62c6b",
"lucid_lynx": "lucid_lynx_value_45404",
"precise_pangolin": "precise_pangolin_value_618429",
"saucy_salamander": "saucy_salamander_value_302796",
"trusty_tahr": "trusty_tahr_value_873",
"vivid_vervet": "vivid_vervet_value_216045",
"wily_werewolf": "wily_werewolf_value_271991"
}
UPD2
Here is the sample c# project mentioned above
UPD3
The issue was resolved after upgrading to Cassandra 3.5.
It sounds like you're seeing CASSANDRA-11048 (JSON Queries are not thread safe). Upgrading Cassandra to a version with the fix is the best way to resolve this.
The only error I see in the generated JSON is the name of the primary key which should be "id" instead of "sa". Otherwise the other columns are correct.
{
"sa": "516b00a2-01a7-11e6-8630-c04f49e62c6b",
"lucid_lynx": "wily_werewolflue_45404",
"precise_pangolin": "precise_pangolin_value_618429",
"saucy_salamander": "saucy_salamander_value_302796",
"trusty_tahr": "trusty_tahr_value_873",
"vivid_vervet": "vivid_vervet_value_216045",
"wily_werewolf": "wily_werewolf_value_271991"
}
What kind of JSON structure did you expect to get as result ?

Two C# source Script Components in ssis result in "Object reference not set to an instance of an object"

Really weird error. I have two independently working source C# scripts in SSIS. Basically they go and grab information from an external CRM source.
When they are both enabled in the same script, On the first script that executes I get:
Object reference not set to an instance of an object
This works
This does not - it freezes on the the first script.
I would think it might be a buffer issue, but it would still complete the first script before throwing the error. Both scripts have unique ids and guids.
Debugging is useless, it dosen't stop on any of the code that I've programmed. I'm stumped.
This is ScriptThree.CreateNewOutputRows() - Important to note that scriptthree is part of the second dataflow task.
public override void CreateNewOutputRows()
{
/*
Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
*/
QueryExpression query = new QueryExpression("email")
{
ColumnSet = new ColumnSet(new string[] { "subject", "regardingobjectid", "createdon", "directioncode" }),
PageInfo = new PagingInfo()
{
Count = 250,
PageNumber = 1,
ReturnTotalRecordCount = false
}
};
EntityCollection results = null;
do
{
results = organizationservice.RetrieveMultiple(query);
foreach (Entity record in results.Entities)
{
emailBuffer.AddRow();
emailBuffer.emailid = record.Id;
if (record.Contains("subject"))
emailBuffer.subject = record.GetAttributeValue<string>("subject");
if (record.Contains("regardingobjectid"))
emailBuffer.regarding = record.GetAttributeValue<EntityReference>("regardingobjectid").Id;
if (record.Contains("createdon"))
emailBuffer.createdon = record.GetAttributeValue<DateTime>("createdon");
if (record.Contains("directioncode"))
emailBuffer.directioncode = record.GetAttributeValue<bool>("directioncode");
}
query.PageInfo.PageNumber++;
query.PageInfo.PagingCookie = results.PagingCookie;
}
while (results.MoreRecords);
}
I still don't know what the exact cause was, But i copied and pasted my script into a new script object, it suddenly started working again.
I suppose, you copied and pasted the whole script component and then modified code in the second one, so you had two script components with the same ComponentScriptId. That's why adding a new one fixed the issue.

System.Text.RegularExpressions.Regex.Replace error in C# for SSIS

I am using the below code to write a ssis package in C# and when I write this code i get an error
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Text.RegularExpressions;
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{
public override void PreExecute()
{
base.PreExecute();
}
public override void PostExecute()
{
base.PostExecute();
}
string toreplace = "[~!##$%^&*()_+`{};':,./<>?]";
string replacewith = "";
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Regex reg = new Regex(toreplace);
Row.NaN = reg.Replace(Row.Na, replacewith);
}
}
The error is
The best overloaded method match for
'System.Text.RegularExpressions.Regex.Replace(string,System.Text.RegularExpressions.MatchEvaluator)' has some invalid arguments
Here Na is the input column and NaN is the output column both are varchar with special characters in Inpout column.
Exceptions:
System.ArgumentNullException
System.ArgumentOutofRangeException
This is the code in the BufferWrapper in the SSIS package
/* THIS IS AUTO-GENERATED CODE THAT WILL BE OVERWRITTEN! DO NOT EDIT!
* Microsoft SQL Server Integration Services buffer wrappers
* This module defines classes for accessing data flow buffers
* THIS IS AUTO-GENERATED CODE THAT WILL BE OVERWRITTEN! DO NOT EDIT! */
using System;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
public class Input0Buffer: ScriptBuffer
{
public Input0Buffer(PipelineBuffer Buffer, int[] BufferColumnIndexes, OutputNameMap OutputMap)
: base(Buffer, BufferColumnIndexes, OutputMap)
{
}
public BlobColumn Na
{
get
{
return (BlobColumn)Buffer[BufferColumnIndexes[0]];
}
}
public bool Na_IsNull
{
get
{
return IsNull(0);
}
}
public Int32 NaN
{
set
{
this[1] = value;
}
}
public bool NaN_IsNull
{
set
{
if (value)
{
SetNull(1);
}
else
{
throw new InvalidOperationException("IsNull property cannot be set to False. Assign a value to the column instead.");
}
}
}
new public bool NextRow()
{
return base.NextRow();
}
new public bool EndOfRowset()
{
return base.EndOfRowset();
}
}
Data flow
Script component, input columns
Script component, actual script
Your code is mostly fine. You are not testing for the possibility that the Na column is NULL. Perhaps your source data doesn't allow for nulls and thus, no need to test.
You can improve your performance by scoping the Regex at the member level and instantiate it in your PreExecute method but that's just a performance thing. Has no bearing on the error message you are receiving.
You can see my package and the expected results. I sent 4 rows down, one with a NULL value, one that shouldn't change and two that have changes required.
My data Flow
I have updated my data flow to match the steps you are using in your chameleon question.
My Source Query
I generate 2 columns of data and 4 rows worth. The Na column, which matches your original question is of type varchar. The column Agency_Names is cast as the deprecated Text data type to match your subsequent updates.
SELECT
D.Na
, CAST(D.Na AS text) AS Agency_Names
FROM
(
SELECT 'Hello world' AS Na
UNION ALL SELECT 'man~ana'
UNION ALL SELECT 'p#$$word!'
UNION ALL SELECT NULL
) D (Na);
Data Conversion
I have added a Data Conversion Transformation after my OLE DB Source. Reflecting what you have done, I converted my Agency_Name to a data type of string [DT_STR] with a length of 50 and aliased it as "Copy of Agency_Name".
Metadata
At this point, I verify that the metadata for my data flow is of type DT_STR or DT_WSTR which are the only allowable inputs for the upcoming call to the regular expression. I confirm that Copy of Agency_Names is the expected data type.
Script Task
I assigned ReadOnly usage to the columns Na and Copy of Agency_Name and aliased the later as "AgencyNames".
I added 2 output columns: NaN which matches your original question and created AgencyNamesCleaned. These are both configured to be DT_STR, codepage 1252, length of 50.
This is the script I used.
public class ScriptMain : UserComponent
{
string toreplace = "[~!##$%^&*()_+`{};':,./<>?]";
string replacewith = "";
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
Regex reg = new Regex(toreplace);
// Test for nulls otherwise Replace will blow up
if (!Row.Na_IsNull)
{
Row.NaN = reg.Replace(Row.Na, replacewith);
}
else
{
Row.NaN_IsNull = true;
}
if (!Row.AgencyNames_IsNull)
{
Row.AgencyNamesCleaned = reg.Replace(Row.AgencyNames, replacewith);
}
else
{
Row.AgencyNamesCleaned_IsNull = true;
}
}
}
Root cause analysis
I think your core issue may be is that the Na column you have isn't a string compatible type. Sriram's comment is spot on. If I look at the autogenerated code for the column Na, in my example I see
public String Na
{
get
{
return Buffer.GetString(BufferColumnIndexes[0]);
}
}
public bool Na_IsNull
{
get
{
return IsNull(0);
}
}
Your source system has provided metadata such that SSIS thinks this column is binary data. Perhaps it's NTEXT/TEXT or n/varchar(max) in the host. You need to do something to make it a compatible operand for the regular expression. I would clean up the column type in the source but if that's not an option, use a Data Conversion transformation to make it into a DT_STR/DT_WSTR type.
Denouement
You can observe in the Data Viewer, attached to my first image, that NaN and AgencyNamesCleaned have correctly stripped the offending characters. Furthermore, you can observe that my Script Task does not have a red X attached to it as your does. This indicates the script is in an invalid state.
As you had created the "Copy of Agency_Names" column from the Data Conversion Component as DT_TEXT, wired it up to the Script Component, and then changed the data type in the Data Conversion Component, the Red X on your script might be resolved by having the transformation refresh its metadata. Open the script and click recompile (ctrl-shift-b) for good measure.
There should be no underlines in your reg.Replace(... code. If there is, there is another facet to your problem that has not been communicated. My best advice at that point would be to recreate a proof of concept package, exactly as I have described and if that works, it becomes an exercise in finding the difference between what you have working and what you do not have working.

Categories