Programmatically C# remove special characters from HTML source - c#

Issue - I am downloading an excel file from an online service. The problem is that it downloads as an XLS extention, but when you do a Save As its actually a WebPage type HTML. What I have to do is open excel and save as Excel file. When I import the file programmatically to Postgres, the data imports fine no problem. The problem is when clicking the save from PgAdmin and the CSV shows those special characters. They look like foreign characters.
However, lets go back to the online service. If I manually copy the list and paste it into Notepad (which strips those characters)...then copy the list back into an excel file. Then there is no issue.
Question is: How to strip these characters programmatically instead of doing it manually with Notepad?
Example of the issue: Look at the Image. The date time has the special character and also the other cell should say "FILLER 5"

Related

How to get a part of docx file as image, not whole page

I have been trying to insert a docx file contents into crystal reports. (In Crystal Reports if I insert OLE object and select a docx file, program imports only the used part of the word file, not the whole page.)
I have searched to convert Word document to image, but all I found was about getting a whole page or a specific image in a page. In my case I can have text and also line objects and maybe some graphic files in maybe 6 - 7 paragraphs. And only used part of the page(6 - 7 paragraphs) should be my image file.
Our customers are using this Word files as a header of the report. For now the only way I can do is to have screen shoot of the design. And it doesn't always look good. Is there a way to get this part as image?
Note: I can't use directly doc file in crystal reports, I need to have the data in SQL database. So it has to be graphic file.
You may find the Selection.CopyAsPicture method which works the same way as the Copy method. Basically you need to select the required content in a Word Document and then use this method to take a picture of selected content and keep it in a memory for further usage. For example, the following example copies the contents of the active document as a picture and pastes it as a picture at the end of the document:
Sub CopyPasteAsPicture()
ActiveDocument.Content.Select
With Selection
.CopyAsPicture
.Collapse Direction:=wdCollapseEnd
.PasteSpecial DataType:=wdPasteMetafilePicture
End With
End Sub
In C# you could use the same properties and methods, the Word object model is common for all kind of applications.

Get page count of RTF, TXT and XLS documents by OpenXml in C#

I want to count pages of the file with types of .rtf, .txt & .xls(x) through Open Xml.
One can say to use Office.InterOp assemblies (also to get counts of TXT & RTF and it works too). but I tried already and can't use as it takes so much time specifically on large files.
I've tried a lot but I couldn't or can say didn't get any reference on web.
By digging deep, for Excel files, I found something described on THIS site, but it just setups the page print settings.
Before printing, I want to have page count to be shown to user.
Also, I'm still trying to have any way for TXT & RTF files. Please share if this also can be via Open Xml and how.

How do I write to Excel cells so that they are formatted correctly with hidden "tick" marks

I have to replace a broken SSIS package that basically just combines the data from a number of different Excel files, and creates a single master file. My C# desktop application is putting all the data into the Excel file correctly, but the one problem is that the original file, which was of type .xls, has some kind of hidden formatting on the cells.
When you just view the data in the cells, it looks normal, but when you click on a cell and view it in the edit box, there is a tick mark (it looks like this ') in front of the data. Editing it out does nothing, it just reappears.
I am guessing it is a "cheater" way to force the data to be of type character, since Excel is such a stinker about turning text which is made up of only numbers into a cell of type number, even if you write it as a string.
I wish we could do without it, but when we try to use a file without that formatting, it blows up the application. And I do not have the time or budget to rewrite the application.
How can I duplicate this "tick" mark in the files which I am writing? Currently I'm using EPPlus for writing, but I can also use Microsoft.Office.Interop.Excel.
If this is too difficult, I would also be open to someone telling me how to manually edit the file after I have created it programatically, and just add the cell formatting to the data.

Open File using Notepad and Copy data to Excel in c#

I have a csv file that is pipe delimited. I want to know if there is any way in c# so that it will open the file in notepad, then copy the data to excel. I want this specific step to be performed, not just a simple copy and paste.
The issue is that the file is getting corrupted if I open it directly through excel.
Hence, opening it first using notepad, copying it to a new excel file and then doing the remaining operations in excel. This gives correct output, unlike opening the file via Excel.
Can someone please let me know if this can be achieved in c#?
You can read a whole file as a System.Text.StringBuilder and replace | with ,. Now try to open the file with Excel, it will open without any issue.
Sample Code
System.Text.StringBuilder str= File.ReadAllText(#"C:\temp\test.csv");

Paste-Link Excel ranges into C# app

I am writing a C# app where I need to paste/link tables/ranges from existing Excel documents.
Functionality that I am looking for is this:
user can select a range of cells in an open Excel doc and do a Copy
user switches to my C# app and does a past-link ... my app shows the table from Excel.
user can edit the source Excel doc - this does not automatically get reflected in the C# app. But I want to provide a Refresh button that when clicked will update the C# app based on the latest data from the linked Excel sheet.
I have figured out how to do a basic copy/paste. I cannot figure out how to do this paste-link. Please note I do not want to ask user in my C# app for any cell ranges..I simply want to do paste-link of what is already in the clipboard...
Any ideas if this can be done...it is all Microsoft so I would be surprised if it can't be.. but I am a C# novice.
Thanks for all input.
I figured it out. Here are the steps.
User copies a range in Excel sheet. It goes to Clipboard in a number
of formats but CSV and ObjectLink formats are of particular interest.
In C# app, trigger a Paste-Link function (this is any button).
Retrieve data from Clipboard using ObjectLink format. This comes out as text which contains:
Excel version identifier
Path to the excel file
The sheet name and the selected range in R1C1 notation
Save the ObjectLink data in your C# app, we will use it later as part of refresh
Retrieve the data from clipboard using CSV format. Parse it out and present in C# app. I converted it to HTML since this is what I am building
Modify the original source excel file - change something in the cells that were part of the original range - save the file.
Go back to C# app, trigger Refresh functionality (this is any button). IN your code do the following:
Using ObjectLink data saved in step 2, open the Excel sheet in the background using Excel Interop API tools. Select the sheet and range. Copy the range programmatically to clipboard.
invoke the same copy from clipboard as used in the last step of 2. Basically get the updated Excel data in CSV format from clipboard and replace the original representation you built during step 1.
This works like a charm although the COM part of opening an excel doc from C# is a bit slow I have to admit.
I have not found any references to this procedure on the net...works for me like a charm.
Cheers.

Categories