Lucene DuplicateFilter problems

Lucene DuplicateFilter problems - c#

I'm using Lucene to search over this table of objects:
name | category
============|==========
John Smith | Dogs
John Smith | Cats
I'm using DuplicateFilter to get only one result for each person.
My problem is when I'm searching for the term "John Smith Dogs" with the DuplicateFilter I get no results. Is there an easy solution for this problem?

I came across this issue recently with the multiple segments not removing duplicates across boundaries. I wrote a little hack in my own custom DuplicateFilter class to cross the boundaries the first time it is called and return the cached bit sets with subsequent calls. I only implemented the CorrectBits (used for PM_FULL_VALIDATION) method but it should be trivial to implement the FastBits (used for PM_FAST_INVALIDATION) one as well based on the original code.
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
using System;
using System.Collections.Generic;
using Lucene.Net.Index;
using Lucene.Net.Util;
namespace Lucene.Net.Search
{
public class DuplicateFilter : Filter
{
public static int KM_USE_FIRST_OCCURRENCE = 1;
public static int KM_USE_LAST_OCCURRENCE = 2;
public static int PM_FAST_INVALIDATION = 2;
public static int PM_FULL_VALIDATION = 1;
private string fieldName;
/*
* KeepMode determines which document id to consider as the master, all others being
* identified as duplicates. Selecting the "first occurrence" can potentially save on IO.
*/
private int keepMode = KM_USE_FIRST_OCCURRENCE;
/*
* "Full" processing mode starts by setting all bits to false and only setting bits
* for documents that contain the given field and are identified as none-duplicates.
* "Fast" processing sets all bits to true then unsets all duplicate docs found for the
* given field. This approach avoids the need to read TermDocs for terms that are seen
* to have a document frequency of exactly "1" (i.e. no duplicates). While a potentially
* faster approach , the downside is that bitsets produced will include bits set for
* documents that do not actually contain the field given.
*
*/
private int processingMode = PM_FULL_VALIDATION;
private object SetupLock = new object();
public DuplicateFilter(string fieldName) : this(fieldName, KM_USE_LAST_OCCURRENCE, PM_FULL_VALIDATION)
{
}
public DuplicateFilter(string fieldName, int keepMode, int processingMode)
{
this.fieldName = fieldName;
this.keepMode = keepMode;
this.processingMode = processingMode;
}
public string FieldName
{
get => fieldName;
set => fieldName = value;
}
public int KeepMode
{
get => keepMode;
set => keepMode = value;
}
public int ProcessingMode
{
get => processingMode;
set => processingMode = value;
}
private IDictionary<string, (OpenBitSet Filtered, OpenBitSet Hit)> SegmentBits
{
get;
set;
} = new Dictionary<string, (OpenBitSet Filtered, OpenBitSet Hit)>();
private bool SetupComplete
{
get;
set;
} = false;
public override bool Equals(object obj)
{
if (this == obj) {
return true;
}
if ((obj == null) || (obj.GetType() != GetType())) {
return false;
}
var other = (DuplicateFilter)obj;
return keepMode == other.keepMode &&
processingMode == other.processingMode &&
(fieldName == other.fieldName || (fieldName != null && fieldName.Equals(other.fieldName)));
}
public override DocIdSet GetDocIdSet(IndexReader reader)
{
if (processingMode == PM_FAST_INVALIDATION) {
return FastBits(reader);
} else {
return CorrectBits(reader);
}
}
public override int GetHashCode()
{
var hash = 217;
hash = 31 * hash + keepMode;
hash = 31 * hash + processingMode;
hash = 31 * hash + fieldName.GetHashCode();
return hash;
}
private OpenBitSet CorrectBits(IndexReader reader)
{
SetupCorrect(reader);
return SegmentBits[((SegmentReader)reader).GetSegmentName()].Filtered;
}
private OpenBitSet FastBits(IndexReader reader)
{
throw new NotImplementedException();
}
private void SetupCorrect(IndexReader reader)
{
lock (SetupLock) {
if (!SetupComplete) {
// Get segment readers and bitsets.
var segmentReaders = new Dictionary<string, SegmentReader>();
var sis = new SegmentInfos();
sis.Read(reader.Directory());
foreach (SegmentInfo si in sis) {
var r = SegmentReader.Get(true, si, 1);
segmentReaders.Add(si.name, r);
SegmentBits.Add(si.name, (new OpenBitSet(r.MaxDoc()), new OpenBitSet(r.MaxDoc())));
}
// Determine duplicates across segments.
foreach (var outerKvp in segmentReaders) {
var startTerm = new Term(fieldName);
var te = outerKvp.Value.Terms(startTerm);
if (te != null) {
var currTerm = te.Term();
while ((currTerm != null) && (currTerm.Field() == startTerm.Field())) {
var set = false;
var lastKey = outerKvp.Key;
var lastDoc = -1;
foreach (var innerKvp in segmentReaders) {
var td = innerKvp.Value.TermDocs(currTerm);
if (td.Next()) {
var doc = td.Doc();
if (SegmentBits[innerKvp.Key].Hit.Get(doc)) {
// Term has already been hit; skip it.
set = true;
continue;
}
// Keep track of which terms have already been hit.
SegmentBits[innerKvp.Key].Hit.Set(doc);
if (set) {
// Only need to set the first term as hit in each segment.
continue;
} else if (keepMode == KM_USE_FIRST_OCCURRENCE) {
SegmentBits[innerKvp.Key].Filtered.Set(doc);
set = true;
} else {
do {
lastDoc = td.Doc();
lastKey = innerKvp.Key;
} while (td.Next());
}
}
}
if (!set && keepMode == KM_USE_LAST_OCCURRENCE) {
SegmentBits[lastKey].Filtered.Set(lastDoc);
}
if (!te.Next()) {
break;
}
currTerm = te.Term();
}
}
}
// Mark setup as complete.
SetupComplete = true;
}
}
}
}
}

Related

How to Update or Add to a Collection properly in Parallel processing (ForEach)

I am trying to process a list of addresses into 2 collections; a list of addresses that meets requirements; a list of scores for each of those addresses that meets requirements.
I have several different criteria to slice and dice the addresses, but I am running into an incorrect implementation of modifying a collection during parallel processing.
For instance, I take the zip code and for each (+388k records) I determine if the Levenshtein Distance is greater or equal to a filtering value. If it does, then add the address to the compiled list of addresses. Then using the SET method, determine if the score for the address already exists. If is does not, then add. Else use the score currently in the collection and update the appropriate property.
I am hitting the "Collection Has Been Modified" error and I can not think of another way to implement the solution. I am looking for alternative ways to accomplish this.
private LevenshteinDistance _ld = new LevenshteinDistance();
private int _filter_zip = int.Parse(ConfigHelper.GetAppSettingByKey("filter_zip"));
public Factory ScoreAddresses(Factory model)
{
Factory result = model;
IList<Address> _compiledList = new List<Address>();
IList<Scores> _compiledScores = new List<Scores>();
ParallelLoopResult MailLoopResult = Parallel.ForEach(
result.ALL_Addresses, factory =>
{
if (factory.MAIL_ZIP_CODE_NBR != null
&& factory.MAIL_ZIP_CODE_NBR.Length >= 1)
{
int _zipDistance = _ld.Compute(
result.Target_Address.MAIL_ZIP_CODE_NBR,
factory.MAIL_ZIP_CODE_NBR);
if (_zipDistance <= _filter_zip)
{
_compiledList.Add(factory);
Scores _score = new Scores();
_compiledScores = _score.Set(_compiledScores,
factory.INDIVIDUAL_ID, "Levenshtein_MAIL_ZIP_CODE_NBR",
_zipDistance, string.Empty, null);
}
}
});
return result;
}
public IList<Scores> Set(IList<Scores> current, int Individual_ID,
string Score_Type, int Levenshtein, string Methaphone, decimal? other)
{
int currentcount = current.Count();
bool updating = false;
IList<Scores> result = current;
try
{
Scores newscore = new Scores();
//Check if scoreing for individual already present
Scores lookingforIndv = current.AsParallel().Where(
p => p.INDIVIDUAL_ID == Individual_ID).FirstOrDefault();
if (lookingforIndv != null)
{
newscore = lookingforIndv;
updating = true;
}
else
{
newscore.INDIVIDUAL_ID = Individual_ID;
updating = false;
}
//Add score based upon indicated score type
switch (Score_Type)
{
case "Levenshtein_MAIL_ZIP_CODE_NBR":
newscore.Levenshtein_MAIL_ZIP_CODE_NBR = Levenshtein;
result.Add(newscore);
break;
//More and More Cases
}
}
catch (Exception e)
{
}
return result;
}

fastest starts with search algorithm

I need to implement a search algorithm which only searches from the start of the string rather than anywhere within the string.
I am new to algorithms but from what I can see it seems as though they go through the string and find any occurrence.
I have a collection of strings (over 1 million) which need to be searched everytime the user types a keystroke.
EDIT:
This will be an incremental search. I currently have it implemented with the following code and my searches are coming back ranging between 300-700ms from over 1 million possible strings. The collection isnt ordered but there is no reason it couldnt be.
private ICollection<string> SearchCities(string searchString) {
return _cityDataSource.AsParallel().Where(x => x.ToLower().StartsWith(searchString)).ToArray();
}

I've adapted the code from this article from Visual Studio Magazine that implements a Trie.
The following program demonstrates how to use a Trie to do fast prefix searching.
In order to run this program, you will need a text file called "words.txt" with a large list of words. You can download one from Github here.
After you compile the program, copy the "words.txt" file into the same folder as the executable.
When you run the program, type a prefix (such as prefix ;)) and press return, and it will list all the words beginning with that prefix.
This should be a very fast lookup - see the Visual Studio Magazine article for more details!
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
namespace ConsoleApp1
{
class Program
{
static void Main()
{
var trie = new Trie();
trie.InsertRange(File.ReadLines("words.txt"));
Console.WriteLine("Type a prefix and press return.");
while (true)
{
string prefix = Console.ReadLine();
if (string.IsNullOrEmpty(prefix))
continue;
var node = trie.Prefix(prefix);
if (node.Depth == prefix.Length)
{
foreach (var suffix in suffixes(node))
Console.WriteLine(prefix + suffix);
}
else
{
Console.WriteLine("Prefix not found.");
}
Console.WriteLine();
}
}
static IEnumerable<string> suffixes(Node parent)
{
var sb = new StringBuilder();
return suffixes(parent, sb).Select(suffix => suffix.TrimEnd('$'));
}
static IEnumerable<string> suffixes(Node parent, StringBuilder current)
{
if (parent.IsLeaf())
{
yield return current.ToString();
}
else
{
foreach (var child in parent.Children)
{
current.Append(child.Value);
foreach (var value in suffixes(child, current))
yield return value;
--current.Length;
}
}
}
}
public class Node
{
public char Value { get; set; }
public List<Node> Children { get; set; }
public Node Parent { get; set; }
public int Depth { get; set; }
public Node(char value, int depth, Node parent)
{
Value = value;
Children = new List<Node>();
Depth = depth;
Parent = parent;
}
public bool IsLeaf()
{
return Children.Count == 0;
}
public Node FindChildNode(char c)
{
return Children.FirstOrDefault(child => child.Value == c);
}
public void DeleteChildNode(char c)
{
for (var i = 0; i < Children.Count; i++)
if (Children[i].Value == c)
Children.RemoveAt(i);
}
}
public class Trie
{
readonly Node _root;
public Trie()
{
_root = new Node('^', 0, null);
}
public Node Prefix(string s)
{
var currentNode = _root;
var result = currentNode;
foreach (var c in s)
{
currentNode = currentNode.FindChildNode(c);
if (currentNode == null)
break;
result = currentNode;
}
return result;
}
public bool Search(string s)
{
var prefix = Prefix(s);
return prefix.Depth == s.Length && prefix.FindChildNode('$') != null;
}
public void InsertRange(IEnumerable<string> items)
{
foreach (string item in items)
Insert(item);
}
public void Insert(string s)
{
var commonPrefix = Prefix(s);
var current = commonPrefix;
for (var i = current.Depth; i < s.Length; i++)
{
var newNode = new Node(s[i], current.Depth + 1, current);
current.Children.Add(newNode);
current = newNode;
}
current.Children.Add(new Node('$', current.Depth + 1, current));
}
public void Delete(string s)
{
if (!Search(s))
return;
var node = Prefix(s).FindChildNode('$');
while (node.IsLeaf())
{
var parent = node.Parent;
parent.DeleteChildNode(node.Value);
node = parent;
}
}
}
}

A couple of thoughts:
First, your million strings need to be ordered, so that you can "seek" to the first matching string and return strings until you no longer have a match...in order (seek via C# List<string>.BinarySearch, perhaps). That's how you touch the least number of strings possible.
Second, you should probably not try to hit the string list until there's a pause in input of at least 500 ms (give or take).
Third, your queries into the vastness should be async and cancelable, because it's certainly going to be the case that one effort will be superseded by the next keystroke.
Finally, any subsequent query should first check that the new search string is an append of the most recent search string...so that you can begin your subsequent seek from the last seek (saving lots of time).

I suggest using linq.
string x = "searchterm";
List<string> y = new List<string>();
List<string> Matches = y.Where(xo => xo.StartsWith(x)).ToList();
Where x is your keystroke search text term, y is your collection of strings to search, and Matches is the matches from your collection.
I tested this with the first 1 million prime numbers, here is the code adapted from above:
Stopwatch SW = new Stopwatch();
SW.Start();
string x = "2";
List<string> y = System.IO.File.ReadAllText("primes1.txt").Split(' ').ToList();
y.RemoveAll(xo => xo == " " || xo == "" || xo == "\r\r\n");
List <string> Matches = y.Where(xo => xo.StartsWith(x)).ToList();
SW.Stop();
Console.WriteLine("matches: " + Matches.Count);
Console.WriteLine("time taken: " + SW.Elapsed.TotalSeconds);
Console.Read();
Result is:
matches: 77025
time taken: 0.4240604
Of course this is testing against numbers and I don't know whether linq converts the values before, or if numbers make any difference.

Most efficient structure for quick access to a random element + the last element on one criteria

We have a function that is informed that we received an item for a specific timestamp.
The purpose of this is to wait that for one specific timestamp, we wait that we receive every item that we are expecting, then push the notification further once we are "synchronized" with all items.
Currently, we have a Dictionary<DateTime, TimeSlot> to store the non-synchronized TimeSlot(TimeSlot = list of all items we received for a specific timestamp).
//Let's assume that this method is not called concurrently, and only once per "MyItem"
public void HandleItemReceived(DateTime timestamp, MyItem item){
TimeSlot slot;
//_pendingTimeSlot is a Dictionary<DateTime,TimeSlot>
if(!_pendingTimeSlot.TryGetValue(timestamp, out slot)){
slot = new TimeSlot(timestamp);
_pendingTimeSlot.Add(timestamp,slot );
//Sometimes we don't receive all the items for one timestamps, which may leads to some ghost-incomplete TimeSlot
if(_pendingTimeSlot.Count>_capacity){
TimeSlot oldestTimeSlot = _pendingTimeSlot.OrderBy(t=>t.Key).Select(t=>t.Value).First();
_pendingTimeSlot.Remove(oldestTimeSlot.TimeStamp);
//Additional work here to log/handle this case
}
}
slot.HandleItemReceived(item);
if(slot.IsComplete){
PushTimeSlotSyncronized(slot);
_pendingTimeSlot.Remove(slot.TimeStamp);
}
}
We have severals instances of this "Synchronizer" in parallels for differents group of items.
It's working fine, except when the system is under heavy loads, we have more incomplete TimeSlot, and the application uses a lot more CPU. The profiler seems to indicate that the Compare of the LINQ query is taking a lot of time(most of the time). So I'm trying to find some structure to hold those references(replace the dictionary)
Here are some metrics:
We have several(variable, but between 10 to 20) instances of this Synchronizer
The current maximum capacity(_capacity) of the synchronizer is 500 items
The shortest interval that we can have between two different timestamp is 100ms(so 10 new Dictionary entry per seconds for each Synchronizer)(most case are more 1 item/second)
For each timestamp, we expect to receive 300-500 items.
So we will do, for one Synchronizer, per second(worst case):
1 Add
500 Get
3-5 Sorts
What would be my best move? I thought to the SortedDictionary But I didn't find any documentation showing me how to take the first element according to the key.

The first thing you can try is eliminating the OrderBy - all you need is the minimum key, no need to sort for getting that:
if (_pendingTimeSlot.Count > _capacity) {
// No Enumerable.Min(DateTime), so doing it manually
var oldestTimeStamp = DateTime.MaxValue;
foreach (var key in _pendingTimeSlot.Keys)
if (oldestTimeStamp > key) oldestTimestamp = key;
_pendingTimeSlot.Remove(oldestTimeStamp);
//Additional work here to log/handle this case
}
What about SortedDictionary, it is an option for sure, although it will consume much more memory. Since it's sorted, you can use simply sortedDictionary.First() to take the key value pair with the minimum key (hence the oldest element in your case).
UPDATE: Here is a hybrid approach using dictionary for fast lookups and ordered double linked list for the other scenarios.
class MyItem
{
// ...
}
class TimeSlot
{
public readonly DateTime TimeStamp;
public TimeSlot(DateTime timeStamp)
{
TimeStamp = timeStamp;
// ...
}
public bool IsComplete = false;
public void HandleItemReceived(MyItem item)
{
// ...
}
// Dedicated members
public TimeSlot PrevPending, NextPending;
}
class Synhronizer
{
const int _capacity = 500;
Dictionary<DateTime, TimeSlot> pendingSlotMap = new Dictionary<DateTime, TimeSlot>(_capacity + 1);
TimeSlot firstPending, lastPending;
//Let's assume that this method is not called concurrently, and only once per "MyItem"
public void HandleItemReceived(DateTime timeStamp, MyItem item)
{
TimeSlot slot;
if (!pendingSlotMap.TryGetValue(timeStamp, out slot))
{
slot = new TimeSlot(timeStamp);
Add(slot);
//Sometimes we don't receive all the items for one timestamps, which may leads to some ghost-incomplete TimeSlot
if (pendingSlotMap.Count > _capacity)
{
// Remove the oldest, which in this case is the first
var oldestSlot = firstPending;
Remove(oldestSlot);
//Additional work here to log/handle this case
}
}
slot.HandleItemReceived(item);
if (slot.IsComplete)
{
PushTimeSlotSyncronized(slot);
Remove(slot);
}
}
void Add(TimeSlot slot)
{
pendingSlotMap.Add(slot.TimeStamp, slot);
// Starting from the end, search for a first slot having TimeStamp < slot.TimeStamp
// If the TimeStamps almost come in order, this is O(1) op.
var after = lastPending;
while (after != null && after.TimeStamp > slot.TimeStamp)
after = after.PrevPending;
// Insert the new slot after the found one (if any).
if (after != null)
{
slot.PrevPending = after;
slot.NextPending = after.NextPending;
after.NextPending = slot;
if (slot.NextPending == null) lastPending = slot;
}
else
{
if (firstPending == null)
firstPending = lastPending = slot;
else
{
slot.NextPending = firstPending;
firstPending.PrevPending = slot;
firstPending = slot;
}
}
}
void Remove(TimeSlot slot)
{
pendingSlotMap.Remove(slot.TimeStamp);
if (slot.NextPending != null)
slot.NextPending.PrevPending = slot.PrevPending;
else
lastPending = slot.PrevPending;
if (slot.PrevPending != null)
slot.PrevPending.NextPending = slot.NextPending;
else
firstPending = slot;
slot.PrevPending = slot.NextPending = null;
}
void PushTimeSlotSyncronized(TimeSlot slot)
{
// ...
}
}
Some additional usages:
Iterating from oldest to newest:
for (var slot = firstPending; slot != null; slot = slot.NextPending)
{
// do something
}
Iterating from oldest to newest and removing items based on a criteria:
for (TimeSlot slot = firstPending, nextSlot; slot != null; slot = nextSlot)
{
nextSlot = slot.NextPending;
if (ShouldRemove(slot))
Remove(slot);
}
Same for reverse scenarios, but using lastPending and PrevPending members instead.

Here is simple sample. The insert method in a list eliminates swapping elements.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
List<Data> inputs = new List<Data>() {
new Data() { date = DateTime.Parse("10/22/15 6:00AM"), data = "abc"},
new Data() { date = DateTime.Parse("10/22/15 4:00AM"), data = "def"},
new Data() { date = DateTime.Parse("10/22/15 6:30AM"), data = "ghi"},
new Data() { date = DateTime.Parse("10/22/15 12:00AM"), data = "jkl"},
new Data() { date = DateTime.Parse("10/22/15 3:00AM"), data = "mno"},
new Data() { date = DateTime.Parse("10/22/15 2:00AM"), data = "pqr"},
};
Data data = new Data();
foreach (Data input in inputs)
{
data.Add(input);
}
}
}
public class Data
{
public static List<Data> sortedData = new List<Data>();
public DateTime date { get; set; }
public string data { get; set;}
public void Add(Data newData)
{
if(sortedData.Count == 0)
{
sortedData.Add(newData);
}
else
{
Boolean added = false;
for(int index = sortedData.Count - 1; index >= 0; index--)
{
if(newData.date > sortedData[index].date)
{
sortedData.Insert(index + 1, newData);
added = true;
break;
}
}
if (added == false)
{
sortedData.Insert(0, newData);
}
}
}
}
}

How do you find out if PDF Textbox field is multi-line using ABCpdf?

I can loop through all the fields in a PDF using ABCpdf using the GetFieldNames() collection and get their properties but the one I can't seem to get is whether or not the field is multi-line text field or not. Is there any example out there of how to find this property? My code is below if it's helpful but it's probably unnecessary.
....
foreach (string fieldName in doc.Form.GetFieldNames())
{
WebSupergoo.ABCpdf9.Objects.Field f = doc.Form[fieldName];
dt = GetFieldInstances(dt,f);
}
....
private static DocumentTemplate GetFieldInstances(DocumentTemplate dt, WebSupergoo.ABCpdf9.Objects.Field f)
{
Field field;
Instance inst = new Instance();
int instanceCount = 0;
bool fieldAlreadyExists = dt.Fields.Any(currentField => currentField.Name == f.Name);
if (!fieldAlreadyExists)
{
field = new Field();
field.Name = f.Name;
field.Value = f.Value;
field.Format = f.Format == null ? null : f.Format;
field.PartialName = f.PartialName;
field.TypeID = (int)f.FieldType;
//field.IsMultiline =
//field.IsRequired =
}
else
{
field = (from currentField in dt.Fields where currentField.Name == f.Name select currentField).SingleOrDefault();
instanceCount = field.Instances.Count();
}
if ((Field.FieldTypes)f.FieldType == Field.FieldTypes.Radio || (Field.FieldTypes)f.FieldType == Field.FieldTypes.Checkbox)
{
inst.ExportValue = f.Options[instanceCount];
}
if (f.Kids.Count() > 0)
{
f = f.Kids[instanceCount];
}
inst.Bottom = (int)f.Rect.Bottom;
inst.Height = (int)f.Rect.Height;
inst.Left = (int)f.Rect.Left;
inst.Width = (int)f.Rect.Width;
inst.PageNumber = f.Page.PageNumber;
field.Instances.Add(inst);
if (!fieldAlreadyExists)
{
dt.Fields.Add(field);
}
return dt;
}

I figured it out:
public bool GetIsMultiLine(WebSupergoo.ABCpdf9.Objects.Field field)
{
var flags = Atom.GetInt(Atom.GetItem(field.Atom, "Ff"));
var isMultiLine = GetIsBitFlagSet(flags, 13);
return isMultiLine;
}
public bool GetIsRequired(WebSupergoo.ABCpdf9.Objects.Field field)
{
var flags = Atom.GetInt(Atom.GetItem(field.Atom, "Ff"));
var isRequired = GetIsBitFlagSet(flags, 2);
return isRequired;
}
public bool GetIsReadOnly(WebSupergoo.ABCpdf9.Objects.Field field)
{
var flags = Atom.GetInt(Atom.GetItem(field.Atom, "Ff"));
var isReadOnly = GetIsBitFlagSet(flags, 1);
return isReadOnly;
}
private static bool GetIsBitFlagSet(int b, int pos)
{
return (b & (1 << (pos - 1))) != 0;
}
In case your not familiar with unsigned integer / binary conversions, I found this site really helpful to understand.
For example, let's say that the integer returned for a field's flags equals 4096. If you enter that into the online coversion tool in that website, it will show you that the 13th bit position is turned on (1 instead of a 0 in the 13th position starting from the right).
In the ABCPdf guide, it says that Multi-Line is bit position 13 so you know that field is a Multi-Line field.
Likewise for the 2nd position for Required and the 1st position for Read-only.

Compare Two Liste <T>

how can i compare 2 list ?
public class Pers_Ordre : IEqualityComparer<Pers_Ordre>
{
int _ordreId;
public int LettreVoidID
{
get { return _LettreVoidID; }
set { _LettreVoidID = value; }
}
string _OrdreCummul;
public string OrdreCummul
{
get { return _OrdreCummul; }
set { _OrdreCummul = value; }
}
// Products are equal if their names and product numbers are equal.
public bool Equals(Pers_Ordre x, Pers_Ordre y)
{
//Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
//Check whether the products' properties are equal.
return x.LettreVoidID == y.LettreVoidID && x.OrdreCummul == y.OrdreCummul;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(Pers_Ordre product)
{
//Check whether the object is null
if (Object.ReferenceEquals(product, null)) return 0;
//Get hash code for the Name field if it is not null.
int hashProductName = product.OrdreCummul == null ? 0 : product.OrdreCummul.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = product.LettreVoidID.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
and i compare like this:
private void simpleButton_Comparer_Click(object sender, EventArgs e)
{
string LeFile_Client = System.IO.Path.Combine(appDir, #"FA.csv");
string LeFile_Server = System.IO.Path.Combine(appDir, #"FA_Server.csv");
List<Pers_Ordre> oListClient = Outils.GetCsv(LeFile_Client).OrderBy(t => t.LettreVoidID).ToList();
List<Pers_Ordre> oListServert = Outils.GetCsvServer(LeFile_Server).OrderBy(t => t.LettreVoidID).ToList();
List<Pers_Ordre> LeDiff = new List<Pers_Ordre>();
LeDiff = oListServert.Except(oListClient).ToList();
string Noid = "", OdreID = "";
foreach (var oDiff in LeDiff)
{
Noid += oDiff.LettreVoidID + " ";
OdreID += oDiff.OrdreCummul + " ";
}
MessageBox.Show(Noid + "--" + OdreID);
}
i can not get the right result.
The Lists contain class objects and we would like to iterate through one list, looking for the same item in a second List and report any differences.
to get object that contains in List A but not in List B
and vice versa.

Your current .Except() call will find items from Server that are missing on the client, but it will not find items on the client that are missing on the server.
Try this:
private void simpleButton_Comparer_Click(object sender, EventArgs e)
{
string LeFile_Client = System.IO.Path.Combine(appDir, #"FA.csv");
string LeFile_Server = System.IO.Path.Combine(appDir, #"FA_Server.csv");
var ListClient = Outils.GetCsv(LeFile_Client).OrderBy(t => t.LettreVoidID);
var ListServer = Outils.GetCsvServer(LeFile_Server).OrderBy(t => t.LettreVoidID);
var LeDiff = ListServer.Except(ListClient).Concat(ListClient.Except(ListServer));
var result = new StringBuilder();
foreach (var Diff in LeDiff)
{
result.AppendFormat("{0} --{1} ", Diff.LettreVoidID, Diff.OrdreCummul);
}
MessageBox.Show(Noid.ToString() + "--" + OdreID);
}
This code should also be significantly faster than your original, as it avoids loading the results into memory until it builds the final string. This code in performs the equivalent of two separate sql LEFT JOINs. We could make it faster still by doing one FULL JOIN, but that would require writing our own linq operator method as well.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Lucene DuplicateFilter problems - c#

Related

How to Update or Add to a Collection properly in Parallel processing (ForEach)

fastest starts with search algorithm

Most efficient structure for quick access to a random element + the last element on one criteria

How do you find out if PDF Textbox field is multi-line using ABCpdf?

Compare Two Liste <T>

Categories

Resources