solr pdf extraction works but no indexing

solr pdf extraction works but no indexing - c#

I working with solr to extract pdf files and index it. now I am able to extract it with the following code:
private static void IndexPDFFile(ISolrOperations<Article> solr)
{
string filecontent = null;
using (var file = File.OpenRead(#"C:\\cookbook.pdf"))
{
var response = solr.Extract(new ExtractParameters(file, "abcd1")
{
ExtractOnly = true,
ExtractFormat = ExtractFormat.Text,
});
filecontent = response.Content;
}
solr.Commit();
}
but when I check solr with the following command in the browser, nothing appears:
http://berserkerpc:444/solr/select/?q=text:solr
or
http://berserkerpc:444/solr/select/?q=author:admin
the content of the pdf file is: This is a Solr cookbook...
the field author should contain somethinh with admin.
here the output:
<response><lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params"><str name="q">text:Solr</str></lst></lst><result name="response" numFound="0" start="0"/></response>
any suggestions for that issue??
thanks,
tro

This is because you have set the ExtractOnly=true in your ExtractParameters. Here is the comment for the ExtractOnly parameter from the source code.
/// <summary>
/// If true, return the extracted content from Tika without indexing the document.
/// This literally includes the extracted XHTML as a string in the response.
/// </summary>
public bool ExtractOnly { get; set; }
If you want to index the extracted content, do not set this parameter to true.

Related

Complex/Nested XML Reading in C#

I have been trying to read this XML file however it is complex/nested a good amount compared to the examples I have seen online. I have tried using LINQ and XMLReader with no luck.
LINQ will read each OrderScreen; however, when it comes to the Cell of each OrderScreen it loads all possible Cells into each OrderScreen even if the Cell does not belong to that OrderScreen. I understand why it does it, but I am fairly new to LINQ and most of the examples I see are not this complex and do not really cover this.
XMLReader works pretty well but it does not continue reading the next Cell after it completed the reading of one OrderScreen, it just reads the first Cell of the next OrderScreen then assumes it is at the end of the document. I did not include that code because all the searches I have seen people using LINQ over XMLReader.
XML is below first, most recent LINQ code after that
Any help is greatly appreciated!
<Screens>
<DeleteScreens></DeleteScreens>
<NewScreens>
<OrderScreen>
<ScreenNumber></ScreenNumber>
<Title></Title>
<NumberOfColumns></NumberOfColumns>
<OptionScreen></OptionScreen>
<ShowQuantityButtons></ShowQuantityButtons>
<PrepSequenceScreen></PrepSequenceScreen>
<Cell>
<CellNumber></CellNumber>
<CellName></CellName>
<InventoryNumber></InventoryNumber>
...more Cell elements..
<OptionGroup>
<Type></Type>
<ScreenNumber></ScreenNumber>
<Cells></Cells>
</OptionGroup>
...more OptionGroups...
</Cell>
...more Cells...
</OrderScreen>
...more OrderScreens...
</NewScreens>
<UpdateMenus>
<Menu>
<MenuNumber></MenuNumber>
<MenuTitle></MenuTitle>
...more Menu elements...
</Menu>
...more Menus...
</UpdateMenus>
<Screens>
XDocument xdoc;
xdoc = XDocument.Load(#"C:\Users\Kwagstaff\Desktop\PMM_3.0\PMM_3.0\XML\Screens.xml");
var ORDERSCREENS = from a in xdoc.Descendants("OrderScreen")
select new
{
ScreenNumber = a.Element("ScreenNumber").Value,
Title = a.Element("Title").Value,
NumberOfColumns = a.Element("NumberOfColumns").Value,
OptionScreen = a.Element("OptionScreen").Value,
ShowQuantityButtons = a.Element("ShowQuantityButtons").Value,
PrepSequenceScreen = a.Element("PrepSequenceScreen").Value,
Cell = from b in xdoc.Descendants("Cell")
select new
{
CellNumber = b.Element("CellNumber"),
}
};

In my opinion, the proper way to do that is with entities and decorators, you will need to do some research but as example
for something like
<MyComplexXML>
....
<xalAddress>...</xalAddress>
<multiPoint>
<MultiPoint>...</MultiPoint>
</multiPoint>
...
</MyComplexXML>
First, you create your classes like this
using System.Xml.Serialization;
namespace MyComplexXML_Model
{
/// <summary>
/// Address field for MyComplexXML
/// </summary>
public class Address
{
/// <summary>
/// XalAddress
/// </summary>
[XmlElement("xalAddress")]
public XalAddress XalAddress;
[XmlElement("multiPoint")]
public MultiPointAddress MultiPointAddress;
}
}
and

using System.Xml.Serialization;
namespace MyComplexXML_Model
{
public class MultiPointAddress
{
[XmlElement("MultiPoint", Namespace = "http://www.sample.net/sample")]
public MultiPoint Multipoint;
}
}
and when your complete hierarchies are in place you can call your root element like this
var ns = new XmlSerializerNamespaces();
ns.Add("sample", "http://www.sample.net/sample");
...
var ms = new MemoryStream();
var sw = new StreamWriter(ms);
//Deserialize from file
var sr = new StreamReader(#"myfile.xml");
var city = (MyComplexXML)new XmlSerializer(typeof(MyComplexXML)).Deserialize(sr);
Hope this point you in the right direction.

Swashbuckle parameter descriptions

I'm using SwaggerResponse attributes to decorate my api controller actions, this all works fine, however when I look at the generated documentation the description field for parameters is empty.
Is a there an attribute based approach to describe action parameters (rather than XML comments)?

With the latest Swashbuckle, or better said at least the Swashbuckle.AspNetCore variant which I'm using, the Description field for parameters can now be displayed correctly as output.
It does require the following conditions to be met:
XML comments must be enabled and configured with Swagger
Parameters should be explicitly decorated with either [FromRoute], [FromQuery], [FromBody] etc.
The same for the method type (get/post/put etc.), which should be decorated with [Http...]
Describe the parameter as usual with a <param ...> xml comment
A full sample looks like this:
/// <summary>
/// Short, descriptive title of the operation
/// </summary>
/// <remarks>
/// More elaborate description
/// </remarks>
/// <param name="id">Here is the description for ID.</param>
[ProducesResponseType(typeof(Bar), (int)HttpStatusCode.OK)]
[HttpGet, Route("{id}", Name = "GetFoo")]
public async Task<IActionResult> Foo([FromRoute] long id)
{
var response = new Bar();
return Ok(response);
}
Which produces the following output:

You should confirm you are allowing Swagger to use XML comments
httpConfig.EnableSwagger(c => {
if (GetXmlCommentsPath() != null) {
c.IncludeXmlComments(GetXmlCommentsPath());
}
...
...
);
protected static string GetXmlCommentsPath() {
var path = HostingEnvironment.MapPath("path to your xml doc file");
return path;
}
You should also check you are generating XML doc for your desired project. Under your desired project Properties (Alt + Enter on top of the project or Right Click -> Properties) -> Build -> Check XML documentation file

For completeness sake, when using latest version of Swashbuckle.AspNetCore (2.1.0) and Swashbuckle.SwaggerGen/Ui (6.0.0), enable Xml documentation file generation in your project's Build
Then the following to your ConfigureServices() method:
services.ConfigureSwaggerGen(options =>
{
options.SingleApiVersion(new Info
{
Version = "v1",
Title = "My API",
Description = "API Description"
});
options.DescribeAllEnumsAsStrings();
var xmlDocFile = Path.Combine(AppContext.BaseDirectory, $"{_hostingEnv.ApplicationName}.xml");
if (File.Exists(xmlDocFile))
{
var comments = new XPathDocument(xmlDocFile);
options.OperationFilter<XmlCommentsOperationFilter>(comments);
options.ModelFilter<XmlCommentsModelFilter>(comments);
}
});

Is there any API for presenting normalized data after a query is made from linq?

For example Service Stack does this to with the Northwind database:
http://www.servicestack.net/ServiceStack.Northwind/customers/ALFKI?format=html
Is there anything that reads the database structure and relationships and output a report based on a primary id?
Obviously, I am looking into alternatives to servicestack.

I use LINQPad's .Dump() object visualizer for that. Download LINQPad from http://www.linqpad.net and reference the .exe in your project.
You will then have access to LINQPads .CreateXhtmlWriter() that can output a beautiful object graph visualization:
just by going:
var listOfItems = DataContext.Items.ToList();
listOfItems.Dump();
The following is not my code, but I cannot find the origin, so bear with me.
Use extension method to create the Xhtml dump and show it a browser:
public static class LinqPadExtensions
{
/// <summary>
/// Writes object properties to HTML
/// and displays them in default browser.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="o"></param>
/// <param name="heading"></param>
public static void Dump<T>(
this T o,
string heading = null
)
{
string localUrl =
Path.GetTempFileName() + ".html";
using (
var writer =
LINQPad.Util.CreateXhtmlWriter(true)
)
{
if (!String.IsNullOrWhiteSpace(heading))
writer.Write(heading);
writer.Write(o);
File.WriteAllText(localUrl, writer.ToString());
}
Process.Start(localUrl);
}
}

How to get value from applicationSettings?

I am trying to get value of service in my application from app.config. I have to send it to the application which shows the URL. A web service which I am consuming in this aplication also using it so can not move it to appSettings.
I want to get this value 'http://192.168.4.22:82/Service.asmx' through c# code.
<applicationSettings>
<SDHSServer.Properties.Settings>
<setting name="DOServer_WebReference1_Service" serializeAs="String">
<value>http://192.168.4.22:82/Service.asmx</value>
</setting>
</SDHSServer.Properties.Settings>
</applicationSettings>

Not sure i get the question,
string s = SDHSServer.Properties.Settings.DOServer_WebReference1_Service;
will get you it

If I understand you correctly you have two Visual Studio C# projects. The first (project A) has a setting you want to access in the second (project B). To do that you have to perform the following steps:
Add a reference from project B to project A
Change the access modifier of the settings i project A to public (default is internal)
Now you can access the setting in project B, in your case using the fully qualified name SDHSServer.Properties.Settings.Default.DOServer_WebReference1_Service
Note that in the settings editor you can set a value for the setting. This is the default value for the setting and this value is also stored in the App.config file for the project. However, you can override this value by providing another value in the App.config file for the application executing.
In this example, the App.config file for project A will contain the value for the setting which is http://192.168.4.22:82/Service.asmx. However, you can override this in the App.config file for project B to get another value. That is probably not what you want to do but you should be aware of this.

I use this code in a ASP.Net 4.0 site to pull section data out of the 'applicationsetting' section:
public sealed class SiteSupport {
/// <summary>
/// Retrieve specific section value from the web.config
/// </summary>
/// <param name="configSection">Main Web.config section</param>
/// <param name="subSection">Child Section{One layer down}</param>
/// <param name="innersection">Keyed on Section Name</param>
/// <param name="propertyName">Element property name</param>
/// <returns></returns>
/// <example>string setting = NoordWorld.Common.Utilities.SiteSupport.RetrieveApplicationSetting("applicationSettings", "NoordWorld.ServiceSite.Properties.Settings", "ServiceWS_SrvWebReference_Service", "value")</example>
public static string RetrieveApplicationSetting(string configSection, string subSection, string innersection, string propertyName) {
string result = string.Empty;
HttpWorkerRequest fakeWorkerRequest = null;
try {
using (TextWriter textWriter = new StringWriter()) {
fakeWorkerRequest = new SimpleWorkerRequest("default.aspx", "", textWriter);
var fakeHTTPContext = new HttpContext(fakeWorkerRequest);
Configuration config = ConfigurationManager.OpenMappedExeConfiguration(new ExeConfigurationFileMap() { ExeConfigFilename = fakeHTTPContext.Server.MapPath(#"~/Web.config") }, ConfigurationUserLevel.None);
ConfigurationSectionGroup group = config.SectionGroups[configSection];
if (group != null) {
ClientSettingsSection clientSection = group.Sections[subSection] as ClientSettingsSection;
if (clientSection != null) {
SettingElement settingElement = clientSection.Settings.Get(innersection);
if (settingElement != null) {
result = (((SettingValueElement)(settingElement.ElementInformation.Properties[propertyName].Value)).ValueXml).InnerText;
}
}
}
}
} catch (Exception ex) {
throw ex;
} finally {
fakeWorkerRequest.CloseConnection();
}
return result;
}
}
https://www.ServiceWS.com/webservices/Golf

Depends something like this.
var s = SDHSServer.Properties.Settings.Default.DOServer_WebReference1_Service;
or
var s = SDHSServer.Properties.Settings.DOServer_WebReference1_Service;

Writing Logs to an XML File with .NET

I am storing logs in an xml file...
In a traditional straight text format approach, you would typically just have a openFile... then writeLine method...
How is it possible to add a new entry into the xml document structure, like you would just with the text file approach?

use an XmlWriter.
example code:
public class Quote
{
public string symbol;
public double price;
public double change;
public int volume;
}
public void Run()
{
Quote q = new Quote
{
symbol = "fff",
price = 19.86,
change = 1.23,
volume = 190393,
};
WriteDocument(q);
}
public void WriteDocument(Quote q)
{
var settings = new System.Xml.XmlWriterSettings
{
OmitXmlDeclaration = true,
Indent= true
};
using (XmlWriter writer = XmlWriter.Create(Console.Out, settings))
{
writer.WriteStartElement("Stock");
writer.WriteAttributeString("Symbol", q.symbol);
writer.WriteElementString("Price", XmlConvert.ToString(q.price));
writer.WriteElementString("Change", XmlConvert.ToString(q.change));
writer.WriteElementString("Volume", XmlConvert.ToString(q.volume));
writer.WriteEndElement();
}
}
example output:
<Stock Symbol="fff">
<Price>19.86</Price>
<Change>1.23</Change>
<Volume>190393</Volume>
</Stock>
see
Writing with an XmlWriter
for more info.

One of the problems with writing a log file in XML format is that you can't just append lines to the end of the file, because the last line has to have a closing root element (for the XML to be valid)
This blog post by Filip De Vos demonstrates quite a good solution to this:
High Performance writing to XML Log files (edit: link now dead so removed)
Basically, you have two XML files linked together using an XML-include thing:
Header file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log [
<!ENTITY loglines SYSTEM "loglines.xml">
]>
<log>
&loglines;
</log>
Lines file (in this example, named loglines.xml):
<logline date="2007-07-01 13:56:04.313" text="start process" />
<logline date="2007-07-01 13:56:25.837" text="do something" />
<logline date="2007-07-01 13:56:25.853" text="the end" />
You can then append new lines to the 'lines file', but (most) XML parsers will be able to open the header file and read the lines correctly.
Filip notes that: This XML will not be parsed correctly by every XML parser on the planet. But all the parsers I have used do it correctly.

The big difference is the way you are thinking about your log data. In plain text files you are indeed just adding new lines. XML is a tree structure however, and you need to think about like such. What you are adding is probably another NODE, i.e.:
<log>
<time>12:30:03 PST</time>
<user>joe</user>
<action>login</action>
<log>
Because it is a tree what you need to ask is what parent are you adding this new node to. This is usually all defined in your DTD (Aka, how you are defining the structure of your data). Hopefully this is more helpful then just what library to use as once you understand this principle the interface of the library should make more sense.

Why reinvent the wheel? Use TraceSource Class (System.Diagnostics) with the XmlWriterTraceListener.

Sorry to post a answer for old thread. i developed the same long time ago. here i like to share my full code for logger saved log data in xml file date wise.
logger class code
using System.IO;
using System.Xml;
using System.Threading;
public class BBALogger
{
public enum MsgType
{
Error ,
Info
}
public static BBALogger Instance
{
get
{
if (_Instance == null)
{
lock (_SyncRoot)
{
if (_Instance == null)
_Instance = new BBALogger();
}
}
return _Instance;
}
}
private static BBALogger _Instance;
private static object _SyncRoot = new Object();
private static ReaderWriterLockSlim _readWriteLock = new ReaderWriterLockSlim();
private BBALogger()
{
LogFileName = DateTime.Now.ToString("dd-MM-yyyy");
LogFileExtension = ".xml";
LogPath= Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location) + "\\Log";
}
public StreamWriter Writer { get; set; }
public string LogPath { get; set; }
public string LogFileName { get; set; }
public string LogFileExtension { get; set; }
public string LogFile { get { return LogFileName + LogFileExtension; } }
public string LogFullPath { get { return Path.Combine(LogPath, LogFile); } }
public bool LogExists { get { return File.Exists(LogFullPath); } }
public void WriteToLog(String inLogMessage, MsgType msgtype)
{
_readWriteLock.EnterWriteLock();
try
{
LogFileName = DateTime.Now.ToString("dd-MM-yyyy");
if (!Directory.Exists(LogPath))
{
Directory.CreateDirectory(LogPath);
}
var settings = new System.Xml.XmlWriterSettings
{
OmitXmlDeclaration = true,
Indent = true
};
StringBuilder sbuilder = new StringBuilder();
using (StringWriter sw = new StringWriter(sbuilder))
{
using (XmlWriter w = XmlWriter.Create(sw, settings))
{
w.WriteStartElement("LogInfo");
w.WriteElementString("Time", DateTime.Now.ToString());
if (msgtype == MsgType.Error)
w.WriteElementString("Error", inLogMessage);
else if (msgtype == MsgType.Info)
w.WriteElementString("Info", inLogMessage);
w.WriteEndElement();
}
}
using (StreamWriter Writer = new StreamWriter(LogFullPath, true, Encoding.UTF8))
{
Writer.WriteLine(sbuilder.ToString());
}
}
catch (Exception ex)
{
}
finally
{
_readWriteLock.ExitWriteLock();
}
}
public static void Write(String inLogMessage, MsgType msgtype)
{
Instance.WriteToLog(inLogMessage, msgtype);
}
}
Calling or using this way
BBALogger.Write("pp1", BBALogger.MsgType.Error);
BBALogger.Write("pp2", BBALogger.MsgType.Error);
BBALogger.Write("pp3", BBALogger.MsgType.Info);
MessageBox.Show("done");
may my code help you and other :)

Without more information on what you are doing I can only offer some basic advice to try.
There is a method on most of the XML objects called "AppendChild". You can use this method to add the new node you create with the log comment in it. This node will appear at the end of the item list. You would use the parent element of where all the log nodes are as the object to call on.
Hope that helps.

XML needs a document element (Basically top level tag starting and ending the document).
This means a well formed XML document need have a beginning and end, which does not sound very suitable for logs, where the current "end" of the log is continously extended.
Unless you are writing batches of self contained logs where you write everything to be logged to one file in a short period of time, I'd consider something else than XML.
If you are writing a log of a work-unit done, or a log that doesn't need to be inspected until the whole thing has finished, you could use your approach though - simply openfile, write the log lines, close the file when the work unit is done.

For editing an xml file, you could also use LINQ. You can take a look on how here:
http://www.linqhelp.com/linq-tutorials/adding-to-xml-file-using-linq-and-c/

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

solr pdf extraction works but no indexing - c#

Related

Complex/Nested XML Reading in C#

Swashbuckle parameter descriptions

Is there any API for presenting normalized data after a query is made from linq?

How to get value from applicationSettings?

Writing Logs to an XML File with .NET

Categories

Resources