I have an HTML code for my asp.net web application is that when i press the button Submit it calls a function to read what's in the bd and load it into an html page with certain template and I want to know how to replace static data in the original HTML file with dynamic ones from mysql database using C#. In my code I did read the input file and i did create an output file to write in as shown below:
StreamReader sr = new StreamReader("filepath/inputcv.html");
StreamWriter sw = new StreamWriter("filepath/outputcv.html");
This is a part of the code i need to replace the content of this paragraph by the one in my database
<div class="sectionContent">
<p>#1</p>
</div>
I saw this code and I wanted to do like it but I don't know how to write the query in it
StreamReader sr = new StreamReader("path/to/file.txt");
StreamWriter sw = new StreamWriter("path/to/outfile.txt");
string sLine = sr.ReadLine();
for (; sLine; sLine = sr.ReadLine() )
{
sLine = "{" + sLine.Replace(" ", ", ") + "}";
sw.Write(sLine);
}
It's possible to store the external data in either text file, or XML file, or database and use it for dynamic update of HTML page content. The first requirement to clarify: what will work as the "filter" for that "dynamic" data updates, in other words, which part of your code will perform dynamic updates the select statement (in case of mySQL)? If that statement remains the same, then it means that underlying data is changing, but what is controlling that changes in DB? And what is the desirable data refreshment rate of page updates? This should be clarified first before proceeding to actual code.
Hope this will help. My best, AB
Here is what I come up with based upon what you have provided. Please comment with more details if this is not enough.
string strHTMLPage = "";
string strNewHTMLPage = "";
int intStartIndex = 0;
int intEndIndex = 0;
string strNewDataToBeInserted = "Assumes you loaded this string with the
data you want inserted";
StreamReader sr = new StreamReader("filepath/inputcv.html");
strHTMLPage = sr.ReadToEnd();
sr.Close();
intStartIndex = strHTMLPage.IndexOf("<div class=\"sectionContent\">", 0) + 28;
intStartIndex = strHTMLPage.IndexOf("<p>", intStartIndex) + 3;
intEndIndex = strHTMLPage.IndexOf("</p>", intStartIndex);
strNewHTMLPage = strHTMLPage.Substring(0, intStartIndex);
strNewHTMLPage += strNewDataToBeInserted;
strNewHTMLPage += strHTMLPage.Substring(intEndIndex);
StreamWriter sw = new System.IO.StreamWriter("filepath/outputcv.html", false, Encoding.UTF8);
sw.Write(strNewHTMLPage);
sw.Close();
Related
I´m developping with c# Interop Word, and I need to know when a page ends, because I´m creating a Word Table. Rows number isn´t always the same,and if the page doesn´t end, then I have to insert a break page.
How can I do that ?
If you're using Streamreader to read your word file, you can do this:
string path = #"C:\..."; // the path of your word file
Streamreader sr = new Streamreader(path);
while (!sr.EndOfStream) {
// what you want to do
{
sr.Close();
If you want to know at what line your file ends, you just create a count value and when your file ends, you check the value of the count and you have it:
string path = #"C:\..."; // the path of your word file
int count = 0;
Streamreader sr = new Streamreader(path);
while (!sr.EndOfStream) {
// what you want to do
count++;
{
sr.Close();
First of all, I'm using a csv file to store data from a DataGridView into it. Now I'm trying to show a printPreview with the data in the csv file. I'm using this:
public void ReadDocument()
{
string docName = #"printDataGridView.csv";
string docPath = Directory.GetCurrentDirectory();
form1.printDocument1.DocumentName = docName;
using (FileStream stream = new FileStream(docPath + #"\" + docName, FileMode.Open))
using (StreamReader reader = new StreamReader(stream))
{
form1.documentContents = reader.ReadToEnd();
}
form1.stringToPrint = form1.documentContents;
}
My problem is that I'm getting a print preview where the values are just listed seperated by semicolons. Want I need is to show the data like it is displayed in the CSV file - tabularly and "clean". So one column with the header and it's values right under it, then the second column and so on...
Is this possible somehow?
I've seen many posts that have helped me get to where I am, I'm new to programming. My intention is to get the files within the directory "sourceDir" and look for a Regex Match. When it finds a Match, I want to create a new file with the Match as the name. If the code finds another file with the same Match (the file already exists) then create a new page within that document.
Right now the code works, however instead of adding a new page, it overwrites the first page of the document. NOTE: Every document in the directory is only one page!
string sourceDir = #"C:\Users\bob\Desktop\results\";
string destDir = #"C:\Users\bob\Desktop\results\final\";
string[] files = Directory.GetFiles(sourceDir);
foreach (string file in files)
{
using (var pdfReader = new PdfReader(file.ToString()))
{
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
var text = new StringBuilder();
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
var currentText =
PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
text.Append(currentText);
Regex reg = new Regex(#"ABCDEFG");
MatchCollection matches = reg.Matches(currentText);
foreach (Match m in matches)
{
string newFile = destDir + m.ToString() + ".pdf";
if (!File.Exists(newFile))
{
using (PdfReader reader = new PdfReader(File.ReadAllBytes(file)))
{
using (Document doc = new Document(reader.GetPageSizeWithRotation(page)))
{
using (PdfCopy copy = new PdfCopy(doc, new FileStream(newFile, FileMode.Create)))
{
var importedPage = copy.GetImportedPage(reader, page);
doc.Open();
copy.AddPage(importedPage);
doc.Close();
}
}
}
}
else
{
using (PdfReader reader = new PdfReader(File.ReadAllBytes(newFile)))
{
using (Document doc = new Document(reader.GetPageSizeWithRotation(page)))
{
using (PdfCopy copy = new PdfCopy(doc, new FileStream(newFile, FileMode.OpenOrCreate)))
{
var importedPage = copy.GetImportedPage(reader, page);
doc.Open();
copy.AddPage(importedPage);
doc.Close();
}
}
}
}
}
}
}
}
Bruno did a great job explaining the problem and how to fix it but since you've said that you are new to programming and you've further posted a very similar and related question I'm going to go a little deeper to hopefully help you.
First, let's write down the knowns:
There's a directory full of PDFs
Each PDF has only a single page
Then the objectives:
Extract the text of each PDF
Compare the extracted text with a pattern
If there's a match, then using the match for a file name do one of:
If a file exists append the source PDF to it
If there isn't a match, create a new file with the PDF
There's a couple of things that you need to know before proceeding. You tried to work in "append mode" by using FileMode.OpenOrCreate. It was a good guess but incorrect. The PDF format has both an beginning and an end, so "start here" and "end here". When you attempt to append another PDF (or anything for that matter) to an existing file you are just writing past the "end here" section. At best, that's junk data that gets ignored but more likely you'll end up with a corrupt PDF. The same is true of almost any file format. Two XML files concatenated is invalid because an XML document can only have one root element.
Second but related, iText/iTextSharp cannot edit existing files. This is very important. It can, however, create brand new files that happen to have the exact or possibly modified versions of other files. I don't know if I can stress how important this is.
Third, you are using a line that get's copied over and over again but is very wrong and actually can corrupt your data. For why it is bad, read this.
currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
Fourth, you are using RegEx which is an overly complicated way to perform a search. Maybe the code that you posted was just a sample but if it wasn't I would recommend just using currentText.Contains("") or if you need to ignore case currentText.IndexOf( "", StringComparison.InvariantCultureIgnoreCase ). For the benefit of the doubt, the code below assumes you have a more complex RegEx.
With all that, below is a full working example that should walk you through everything. Since we don't have access to your PDFs, the second section actually creates 100 sample PDFs with our search terms occasionally added to them. Your real code obviously wouldn't do this but we need common ground to work with you on. The third section is the search and merge feature that you are trying to do. Hopefully the comments in the code explain everything.
/**
* Step 1 - Variable Setup
*/
//This is the folder that we'll be basing all other directory paths on
var workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//This folder will hold our PDFs with text that we're searching for
var folderPathContainingPdfsToSearch = Path.Combine(workingFolder, "Pdfs");
var folderPathContainingPdfsCombined = Path.Combine(workingFolder, "Pdfs Combined");
//Create our directories if they don't already exist
System.IO.Directory.CreateDirectory(folderPathContainingPdfsToSearch);
System.IO.Directory.CreateDirectory(folderPathContainingPdfsCombined);
var searchText1 = "ABC";
var searchText2 = "DEF";
/**
* Step 2 - Create sample PDFs
*/
//Create 100 sample PDFs
for (var i = 0; i < 100; i++) {
using (var fs = new FileStream(Path.Combine(folderPathContainingPdfsToSearch, i.ToString() + ".pdf"), FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
//Add a title so we know what page we're on when we combine
doc.Add(new Paragraph(String.Format("This is page {0}", i)));
//Add various strings every once in a while.
//(Yes, I know this isn't evenly distributed but I haven't
// had enough coffee yet.)
if (i % 10 == 3) {
doc.Add(new Paragraph(searchText1));
} else if (i % 10 == 6) {
doc.Add(new Paragraph(searchText2));
} else if (i % 10 == 9) {
doc.Add(new Paragraph(searchText1 + searchText2));
} else {
doc.Add(new Paragraph("Blah blah blah"));
}
doc.Close();
}
}
}
}
/**
* Step 3 - Search and merge
*/
//We'll search for two different strings just to add some spice
var reg = new Regex("(" + searchText1 + "|" + searchText2 + ")");
//Loop through each file in the directory
foreach (var filePath in Directory.EnumerateFiles(folderPathContainingPdfsToSearch, "*.pdf")) {
using (var pdfReader = new PdfReader(filePath)) {
for (var page = 1; page <= pdfReader.NumberOfPages; page++) {
//Get the text from the page
var currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, new SimpleTextExtractionStrategy());
currentText.IndexOf( "", StringComparison.InvariantCultureIgnoreCase )
//DO NOT DO THIS EVER!! See this for why https://stackoverflow.com/a/10191879/231316
//currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
//Match our pattern against the extracted text
var matches = reg.Matches(currentText);
//Bail early if we can
if (matches.Count == 0) {
continue;
}
//Loop through each match
foreach (var m in matches) {
//This is the file path that we want to target
var destFile = Path.Combine(folderPathContainingPdfsCombined, m.ToString() + ".pdf");
//If the file doesn't already exist then just copy the file and move on
if (!File.Exists(destFile)) {
System.IO.File.Copy(filePath, destFile);
continue;
}
//The file exists so we're going to "append" the page
//However, writing to the end of file in Append mode doesn't work,
//that would be like "add a file to a zip" by concatenating two
//two files. In this case, we're actually creating a brand new file
//that "happens" to contain the original file and the matched file.
//Instead of writing to disk for this new file we're going to keep it
//in memory, delete the original file and write our new file
//back onto the old file
using (var ms = new MemoryStream()) {
//Use a wrapper helper provided by iText
var cc = new PdfConcatenate(ms);
//Open for writing
cc.Open();
//Import the existing file
using (var subReader = new PdfReader(destFile)) {
cc.AddPages(subReader);
}
//Import the matched file
//The OP stated a guarantee of only 1 page so we don't
//have to mess around with specify which page to import.
//Also, PdfConcatenate closes the supplied PdfReader so
//just use the variable pdfReader.
using (var subReader = new PdfReader(filePath)) {
cc.AddPages(subReader);
}
//Close for writing
cc.Close();
//Erase our exisiting file
File.Delete(destFile);
//Write our new file
File.WriteAllBytes(destFile, ms.ToArray());
}
}
}
}
}
I'll write this in pseudo code.
You do something like this:
// loop over different single-page documents
for () {
// introduce a condition
if (condition == met) {
// create single-page PDF
new Document();
new PdfCopy();
document.Open();
copy.add(singlePage);
document.Close();
}
}
This means that you are creating a single-page PDF every time the condition is met. Incidentally, you're overwriting existing files many times.
What you should do, is something like this:
// Create a document with as many pages as times a condition is met
new Document();
new PdfCopy();
document.Open();
// loop over different single-page documents
for () {
// introduce a condition
if (condition == met) {
copy.addPage(singlePage);
}
}
document.Close();
Now you are possibly adding more than one page to the new document you are creating with PdfCopy. Be careful: an exception can be thrown if the condition is never met.
I am exporting data into Excel from a web page. This should be a no brainer, but there are <p> tags in the data. This causes Excel to create new rows when the data should all be in the same cell. After some research I found that mso-data-placement should do the trick, but it's not working. Excel opens, the data is displayed, but extra uncessary rows are created. Here is the code I use to export the data:
protected void doexcel()
{
string style = #"<style type='text/css'>P {mso-data-placement:same-cell; font-weight:bold;}</style>";
HttpResponse response = HttpContext.Current.Response;
// first let's clean up the response.object
response.Clear();
response.Charset = "";
//set the response mime type for excel
response.ContentType = "application/vnd.ms-excel";
Random RandomClass = new Random();
int RandomNumber = RandomClass.Next();
String filename = "a" + RandomNumber + DateTime.Now + ".xls";
response.AddHeader("Content-Disposition", "attachment;filename=\"" + filename + "\"" );
// create a string writer
using (StringWriter sw = new StringWriter())
{
using (HtmlTextWriter htw = new HtmlTextWriter(sw))
{
HttpContext.Current.Response.Write(style);
SqlDataSourceEmployeeAssets.ConnectionString = MyObjects.Application.CurrentContext.ConnectionString;
String sql = (string)Session["sql"];
SqlDataSourceEmployeeAssets.SelectCommand = sql;
// lCount.Text = "Query returned " + getCount(query) + " rows.";
DataGrid dge = new DataGrid();
dge.DataSource = SqlDataSourceEmployeeAssets;
dge.DataBind();
dge.RenderControl(htw);
response.Write(sw.ToString());
response.End();
}
}
}
This is an example of the raw data in the database that is giving me grief:
<P>4/13/2011 : Cheng "Jonathan" Vaing is with BSES Graffiti Unit.</P><P>4/13/2011 : Cheng "Jonathan" Vaing is with</P>
Suggestions?
I tried a couple of other things
I went straight to the data and added the mso-data-placement attribute to the paragraph tag inline. Still didn't work. The data looked like this
<P style="mso-data-placement:same-cell> my data </p>
I tried other mso-* attributes, that didn't work either. For example, I changed my stylesheet to look like this
<style type='text/css'>P {mso-highlight:yellow}</style>";
Why oh why doesn't Excel recognize my mso-* attributes?!?!
There is a solution but it is not clean.
After the dge.DataBind, place the following code. This will encode the text of each cell
foreach (DataGridItem dgi in dge.Items)
{
foreach (TableCell cell in dgi.Cells)
{
cell.Text = WebUtility.HtmlEncode(cell.Text);;
}
}
The Excel file, when opened, should show the raw data with the markup, all in one cell.
I found that this works because Excel actually encodes the text, as well. To see what Excel does in action, do the following:
Create a new workbook in Excel (I am using Office 2013).
In the first cell, paste the raw data (as you have it displayed). Do this by first pressing F2 (insert into cell), then paste the text.
Save the workbook as an HTML file (or web page).
Using windows explorer, go to the folder location of where you saved the file. There should be a hidden folder (i think it is hidden) with the same name as your file. For example, if your workbook is Book1.htm, there should be a folder labeled Book1_files.
In this folder, there should be an HTM file with the name sheet001.htm. Open this file in notepad (or any text editor...not excel or word)
Locate your raw data. You will see that the text is not showing the HTML markup, rather it is showing the encoded version.
Hope this helps.
I'm using itextsharp on vb.net to get the text content from a pdf file. The solution works fine for some files but not for other even quite simple ones. The problem is that the token stringvalue is set to null (a set of empty square boxes)
token = New iTextSharp.text.pdf.PRTokeniser(pageBytes)
While token.NextToken()
tknType = token.TokenType()
tknValue = token.StringValue
I can meassure the length of the content but I cannot get the actual string content.
I realized that this happens depending on the font of the pdf. If I create a pdf using either Acrobat or PdfCreator with Courier (that by the way is the default font in my visual studio editor) I can get all the text content. If the same pdf is built using a different font I got the empty square boxes.
Now the question is, How can I extract text regardless of the font setting?
Thanks
complementary for Mark's answer that helps me a lot .iTextSharp implementation namespaces and classes are a bit different from java version
public static string GetTextFromAllPages(String pdfPath)
{
PdfReader reader = new PdfReader(pdfPath);
StringWriter output = new StringWriter();
for (int i = 1; i <= reader.NumberOfPages; i++)
output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy()));
return output.ToString();
}
Check out PdfTextExtractor.
String pageText =
PdfTextExtractor.getTextFromPage(myReader, pageNum);
or
String pageText =
PdfTextExtractor.getTextFromPage(myReader, pageNum, new LocationTextExtractionStrategy());
Both require fairly recent versions of iText[Sharp]. Actually parsing the content stream yourself is just reinventing the wheel at this point. Spare yourself some pain and let iText do it for you.
PdfTextExtractor will handle all the different font/encoding issues for you... all the ones that can be handled anyway. If you can't copy/paste from Reader accurately, then there's not enough information present in the PDF to get character information from the content stream.
Here is a variant with iTextSharp.text.pdf.PdfName.ANNOTS and iTextSharp.text.pdf.PdfName.CONTENT if some one need it.
string strFile = #"C:\my\path\tothefile.pdf";
iTextSharp.text.pdf.PdfReader pdfRida = new iTextSharp.text.pdf.PdfReader(strFile);
iTextSharp.text.pdf.PRTokeniser prtTokeneiser;
int pageFrom = 1;
int pageTo = pdfRida.NumberOfPages;
iTextSharp.text.pdf.PRTokeniser.TokType tkntype ;
string tknValue;
for (int i = pageFrom; i <= pageTo; i++)
{
iTextSharp.text.pdf.PdfDictionary cpage = pdfRida.GetPageN(i);
iTextSharp.text.pdf.PdfArray cannots = cpage.GetAsArray(iTextSharp.text.pdf.PdfName.ANNOTS);
if(cannots!=null)
foreach (iTextSharp.text.pdf.PdfObject oAnnot in cannots.ArrayList)
{
iTextSharp.text.pdf.PdfDictionary cAnnotationDictironary = (iTextSharp.text.pdf.PdfDictionary)pdfRida.GetPdfObject(((iTextSharp.text.pdf.PRIndirectReference)oAnnot).Number);
iTextSharp.text.pdf.PdfObject moreshit = cAnnotationDictironary.Get(iTextSharp.text.pdf.PdfName.CONTENTS);
if (moreshit != null && moreshit.GetType() == typeof(iTextSharp.text.pdf.PdfString))
{
string cStringVal = ((iTextSharp.text.pdf.PdfString)moreshit).ToString();
if (cStringVal.ToUpper().Contains("LOS 8"))
{ // DO SOMETHING FUN
}
}
}
}
pdfRida.Close();