Windows indexed files OLE DB search with like instead of contains - c#

So I have the problem that I need to exchange this contains, with a like in my query. Because when you search with a dash in a word so "12-0430-1" then the contains produces results that also contain only 12 or 0430 or 1. This is intended and also spoken about by microsoft here. Also the solution is in this article, but sadly only in a way that does not help. The solution is to exchange contains with like, but I always get an error with this edited query:
SELECT TOP 2000 System.FileName
FROM systemindex
WHERE DIRECTORY = 'C:\Path...'
AND (System.FullText Like ('%12-0430-1%'))
ORDER BY System.DateCreated DESC
The error says because the column does not exist, but for OLE DB search I only found this older site which specifies which columns can be searched.
Before I did not need it because I used contains:
Ole-DB query: for 22-0130-1 SELECT TOP 2000 System.FileName FROM
systemindex WHERE DIRECTORY = 'C:\Path' AND (CONTAINS
('"12-0430-1*"')) ORDER BY System.DateCreated DESC
So could somebody please link me a page with the current columns that can be searched over windows indexed files or say which column exists and would be working, I want to search through the indexed contents of the file.
Edit these are my settings with which I create the OleDBConnection:
OleDbConnection connection =
new OleDbConnection("Provider=Search.CollatorDSO;Extended Properties='Application=Windows';");

Related

SSIS Get List of Missing Files

I get a monthly submission of files from various companies that get loaded into SQL Server via an SSIS job. When the SSIS job runs I want to get a list of companies that did not submit a file. The files will have the date appended to the end so I'm assuming it will need to do some sort of wild card search through a list of names. If I'm expecting:
AlphaCO_File_yyyymmdd
BetaCO_File_yyyymmdd
DeltaCO_File_yyyymmdd
ZetaCO_File_yyyymmdd
and the file from ZetacO is missing I want to write ZetaCO to a table, or save it in a variable I can use in an email task.
I am using Visual Studio 2019 and SQL Server 2019. I have the Task Factory add-on for SSIS.
Note this answers uses psuedo code that needs to be tuned for your specific values.
My guess is that you already have a foreach loop set up where you are reading the file name to a parameter.
What you need is a table in SQL Server to compare against.
CompanyName, FileSubmitted (bit)
AlphaCO
BetaCo
DeltaCo
ZetaCO
First step is SQL command: udpate table set FileSubmitted = 0.
Then in each for loop have one path to update that table based on file name:
Use token or use C# task. c# would be company = fileName.Split('_')[0];
And then update the table:
update table
set FileSubmitted = 1
where CompanyName = company
Now you can use that table for emails.

SQL Server database truncated a big base64 string [duplicate]

How do you view ALL text from an NTEXT or NVARCHAR(max) in SQL Server Management Studio? By default, it only seems to return the first few hundred characters (255?) but sometimes I just want a quick way of viewing the whole field, without having to write a program to do it. Even SSMS 2012 still has this problem :(
I was able to get the full text (99,208 chars) out of a NVARCHAR(MAX) column by selecting (Results To Grid) just that column and then right-clicking on it and then saving the result as a CSV file. To view the result open the CSV file with a text editor (NOT Excel). Funny enough, when I tried to run the same query, but having Results to File enabled, the output was truncated using the Results to Text limit.
The work-around that #MartinSmith described as a comment to the (currently) accepted answer didn't work for me (got an error when trying to view the full XML result complaining about "The '[' character, hexadecimal value 0x5B, cannot be included in a name").
Quick trick-
SELECT CAST('<A><![CDATA[' + CAST(LogInfo as nvarchar(max)) + ']]></A>' AS xml)
FROM Logs
WHERE IDLog = 904862629
In newer versions of SSMS it can be configured in the (Query/Query Options/Results/Grid/Maximum Characters Retrieved) menu:
Old versions of SSMS
Options (Query Results/SQL Server/Results to Grid Page)
To change the options for the current queries, click Query Options on the Query menu, or right-click in the SQL Server Query window and select Query Options.
...
Maximum Characters Retrieved
Enter a number from 1 through 65535 to specify the maximum number of characters that will be displayed in each cell.
Maximum is, as you see, 64k. The default is much smaller.
BTW Results to Text has even more drastic limitation:
Maximum number of characters displayed in each column
This value defaults to 256. Increase this value to display larger result sets without truncation. The maximum value is 8,192.
I have written an add-in for SSMS and this problem is fixed there. You can use one of 2 ways:
you can use "Copy current cell 1:1" to copy original cell data to clipboard:
http://www.ssmsboost.com/Features/ssms-add-in-copy-results-grid-cell-contents-line-with-breaks
Or, alternatively, you can open cell contents in external text editor (notepad++ or notepad) using "Cell visualizers" feature: http://www.ssmsboost.com/Features/ssms-add-in-results-grid-visualizers
(feature allows to open contents of field in any external application, so if you know that it is text - you use text editor to open it. If contents is binary data with picture - you select view as picture. Sample below shows opening a picture):
Return data as XML
SELECT CONVERT(XML, [Data]) AS [Value]
FROM [dbo].[FormData]
WHERE [UID] LIKE '{my-uid}'
Make sure you set a reasonable limit in the SSMS options window, depending on the result you're expecting.
This will work if the text you're returning doesn't contain unencoded characters like & instead of & that will cause the XML conversion to fail.
Returning data using PowerShell
For this you will need the PowerShell SQL Server module installed on the machine on which you'll be running the command.
If you're all set up, configure and run the following script:
Invoke-Sqlcmd -Query "SELECT [Data] FROM [dbo].[FormData] WHERE [UID] LIKE '{my-uid}'" -ServerInstance "database-server-name" -Database "database-name" -Username "user" -Password "password" -MaxCharLength 10000000 | Out-File -filePath "C:\db_data.txt"
Make sure you set the -MaxCharLength parameter to a value that suits your needs.
I was successful with this method today. It's similar to the other answers in that it also converts the contents to XML, just using a different method. As I didn't see FOR XML PATH mentioned amongst the answers, I thought I'd add it for completeness:
SELECT [COL_NVARCHAR_MAX]
FROM [SOME_TABLE]
FOR XML PATH(''), ROOT('ROOT')
This will deliver a valid XML containing the contents of all rows, nested in an outer <ROOT></ROOT> element. The contents of the individual rows will each be contained within an element that, for this example, is called <COL_NVARCHAR_MAX>. The name of that can be changed using an alias via AS.
Special characters like &, < or > or similar will be converted to their respective entities. So you may have to convert <, > and & back to their original character, depending on what you need to do with the result.
EDIT
I just realized that CDATA can be specified using FOR XML too. I find it a bit cumbersome though. This would do it:
SELECT 1 as tag, 0 as parent, [COL_NVARCHAR_MAX] as [COL_NVARCHAR_MAX!1!!CDATA]
FROM [SOME_TABLE]
FOR XML EXPLICIT, ROOT('ROOT')
PowerShell Alternative
This is an old post and I read through the answers. Still, I found it a bit too painful to output multi-line large text fields unaltered from SSMS. I ended up writing a small C# program for my needs, but got to thinking it could probably be done using the command line. Turns out, it is fairly easy to do so with PowerShell.
Start by installing the SqlServer module from an administrative PowerShell.
Install-Module -Name SqlServer
Use Invoke-Sqlcmd to run your query:
$Rows = Invoke-Sqlcmd -Query "select BigColumn from SomeTable where Id = 123" `
-MaxCharLength 2147483647 -ConnectionString $ConnectionString
This will return an array of rows that you can output to the console as follows:
$Rows[0].BigColumn
Or output to a file as follows:
$Rows[0].BigColumn | Out-File -FilePath .\output.txt -Encoding UTF8
The result is a beautiful un-truncated text written to a file for viewing/editing. I am sure there is a similar command to save back the text to SQL Server, although that seems like a different question.
EDIT: It turns out that there was an answer by #dvlsc that described this approach as a secondary solution. I think because it was listed as a secondary answer, is the reason I missed it in the first place. I am going to leave my answer which focuses on the PowerShell approach, but wanted to at least give credit where it was due.
If you only have to view it, I've used this:
print cast(dbo.f_functiondeliveringbigformattedtext(seed) as text)
The end result is that I get line feeds and all the content in the messages window of SMSS.
Of course, it only allows for a single cell - if you want to do a single cell from a number of rows, you could do this:
declare #T varchar(max)=''
select #T=#T
+ isnull(dbo.f_functiondeliveringbigformattedtext(x.a),'NOTHINGFOUND!')
+ replicate(char(13),4)
from x -- table containing multiple rows and a value in column a
print #T
I use this to validate JSON strings generated by SQL code. Too hard to read otherwise!
Use visual studio code with sql server plugin. Super usefull for jsons
Alternative 1: Right Click to copy cell and Paste into Text Editor (hopefully with utf-8 support)
Alternative 2: Right click and export to CSV File
Alternative 3: Use SUBSTRING function to visualize parts of the column. Example:
SELECT SUBSTRING(fileXml,2200,200) FROM mytable WHERE id=123456
The easiest way to quickly view large varchar/text column:
declare #t varchar(max)
select #t = long_column from table
print #t

Strange symbols in SQLite text field

I have a simple SQLite database where data (names) is added with a C# application. The names usually get copied and pasted from .pdf files. I found out that sometimes copying a name from .pdf generates some weird symbols. During browsing data with SQLite DB Browser I saw that some records in my database have things mingled in between like 'DC3', 'FS', 'US' and so on:
This messes with 'WHERE' clause in my queries, for example the following query would yield 0 results:
SELECT Id FROM tblPerson WHERE Name = 'Alex Denelgo';
Can someone explain what these symbols are and how can I write query to find all the "corrupted" name records? I can't go one by one manually with browser since the data already contains thousands of different names.
It seems these symbols are Non printable ASCII control characters.
The way I found the "corrupted" records is using regex. If you have the same problem as me you can use the following query to find these kinds of records. I am selecting all records minus records that only contain letters from a-z, space and dot you can modify the regex for your case of course:
SELECT Name FROM tblPerson
EXCEPT
SELECT Name FROM tblPerson WHERE Name REGEXP "^[A-Za-z .]+$";

I need to create a new ETL C# process with large .CSV files to SQL Tables Add/Update records

I need to bring in a number of .CSV files into unique keyed SQL Tables (Table names and column names match from source to target). I started looking at libs like Cinchoo-ETL, but I need to do an "Upsert" Meaning Update if record is present insert if it's not present. I'm not sure if Cinchoo-ETL or some other lib has this feature built in.
For example lets say the SQL Server Customer table has some records in it, Cust# is a primary Key
Cust# Name
1 Bob
2 Jack
The CSV file looks something like this:
Cust#,Name
2,Jill
3,Roger
When the ETL program runs it needs to update Cust# 2 from Jack to Jill and insert a new cust# 3 record for Roger.
Speed reusability is important as there will be 80 or so different tables, some of the tables can have several million records in them.
Any ideas for a fast easy way to do this? Keep in mind I'm not a daily developer so examples would be awesome.
Thanks!
You are describing something than can be done with a tool I developped. It's called Fuzible (www.fuzible-app.com) : in Synchronization mode, it allows you to choose the behavior of the target table (allow INSERT, UPDATE, DELETE) and your Source can be any CSV file and your target, any database.
You can contact me from the website if you need an how-to.
The software is free :)
What you have to do is to create a Job with your CSV path as a Source connection, then your Database as the Target connection.
Choose the "Synchronization" mode, which, by opposition with the "Replication" mode will compare Source and Target data.
Then, you can write as many queries as you want (one for each CSV file) like this :
MyOutputTable:SELECT * FROM MyCSVFile.CSV
No need to write more complex queries if both CSV and database table shares the same schema (same columns)
The software should be able to do the rest :) Update rows than need to be updated and create new rows if required.
This is what I did in a recent SSIS job. I load the data to a temp table just use a regular SQL query to perform this comparison. This may be cumbersome on tables with lots of fields.
-- SEE DIFFERENCES FOR YOUR AMUSEMENT
SELECT *
FROM Accounts a
INNER JOIN
DI_Accounts da
ON a.CustomerNumber = da.CustomerNumber AND (
a.FirstName <> da.FirstName
)
-- UPDATE BASED ON DIFFERENCES
UPDATE a
SET
a.FirstName = a.FirstName
FROM Accounts a
INNER JOIN
DI_Accounts da
ON a.ModelId = da.ModelId AND (
a.FirstName <> da.FirstName
)
I would recommend that you take a look at the nuget package ETLBox and the necessary extension packages for Csv & Sql Server (ETLBox.Csv + ETLBox.SqlServer).
This would allow you to write a code like this:
//Create the components
CsvSource source = new CsvSource("file1.csv");
SqlConnectionManager conn = new SqlConnectionManager("..connection_string_here..");
DbMerge dest = new DbMerge(conn, "DestinationTableName");
dest.MergeProperties.IdColumns.Add(new IdColumn() { IdPropertyName = "Cust#" });
dest.MergeMode = MergeMode.Full; //To create the deletes
dest.CacheMode = CacheMode.Partial; //Enable for bigger data sets
//Linking
source.LinkTo(dest);
//Execute the data flow
Network.Execute(source);
This code snipped would do the corresponding inserts/updates & deletes into the database table for one file. Make sure that the header names match exactly with the column names in your database table (case-sensitive). For bigger data sets you need to enable the partial cache, to avoid having all data loaded into Memory.
It will use dynamic objects under the hood (ExpandoObject). You can find more information about the merge and the tool on the website (www.etlbox.net)
The only downside is that ETLBox is not open source. But the package allows you to work with data sets up to 10.000 rows to check if it suits your needs.

comparing to columns looking for similarity

I've got a program that is looking at what files have changed in each SVN commit, which will also highlight the area that has changed. I'm passing that data which has been retrieved from SVN and places it into a table in a SQL server database. What I'd like to do is compare the paths to see what area has been effected. I've already got a table which has a path that shows what area has been effected.
Example:
SVN:
branches/Projects/Enhancements2015Q1/WMDB/WMDB.cs
this is the path that has been found by the code
Compare table
branches/Projects/Enhancements2015Q1/GEM4/Utilities/Utilities.csproj = Utilities
branches/Projects/Enhancements2015Q1/WMDB/WMDB.cs = WMDB
trunk/src/GEM 4/GEM4/UI/Forms/AutoRenderOptionsForm.cs = UI
So what I'd like to find is that the path found in SVN has changed the WMDB.

Categories