I work with cellular automata. My repo for my work is here. The basic structure is
1) A grid of
2) cells, which may have
3) agents.
The agents act according to a set of rules, and typically one designates "states" for the agents (agents of different states have different rules). One (relatively) well-known CA is the game of life.
I'm trying to expand things a bit more and incorporate other types of "properties" in my CAs, mainly to simulate various phenomena (imagine an animal agent that consumes a plant agent, with the plant's "biomass" or what have you decreasing).
To do this I'm incorporating a normal dictionary, with strings as keys and a struct called CAProperty as the value. The struct is as follows:
public struct CAProperty
{
public readonly string name;
public readonly dynamic value;
//public readonly Type type;
public CAProperty(string name, dynamic value)
{
this.name = name;
this.value = value;
}
}
(note: previously I had the "type" variable to enable accurate typing at runtime...but in attempts to solve the issue in this post I removed it. Fact is, it'll need to be added back in)
This is well and good. However, I'm trying to do some work with large grid sizes. 100x100. 1000x1000. 5000x5000, or 25 million cells (and agents). That would be 25 million dictionaries.
See the image: a memory snapshot from Visual Studio for a 4000x4000 grid, or 16 million agents (I tried 5000x5000, but Visual Studio wouldn't let me take a snapshot).
On the right, one can clearly see that the debugger is reading 8 GB memory usage (and I tried this in a release version to see 6875 MB usage). However, when I count up everything in the third column of the snapshot, I arrive at less than 4 GB.
Why is there such a dramatic discrepancy between the total memory usage and the size of objects stored in memory?
Additionally: how might I optimize memory usage (mainly the dictionaries - is there another collection with similar behavior but lower memory usage)?
Edit: For each of the three "components" (Grid, Cell, Agent) I have a class. They all inherit from an original CAEntity class. All are shown below.
public abstract class CAEntity
{
public CAEntityType Type { get; }
public Dictionary<string, dynamic> Properties { get; private set; }
public CAEntity(CAEntityType type)
{
this.Type = type;
}
public CAEntity(CAEntityType type, Dictionary<string, dynamic> properties)
{
this.Type = type;
if(properties != null)
{
this.Properties = new Dictionary<string, dynamic>(properties);
}
}
}
public class CAGraph:CAEntity
{
public ValueTuple<ushort, ushort, ushort> Dimensions { get; }
public CAGraphCell[,,] Cells { get;}
List<ValueTuple<ushort, ushort, ushort>> AgentCells { get; set; }
List<ValueTuple<ushort, ushort, ushort>> Updates { get; set; }
public CA Parent { get; private set; }
public GridShape Shape { get; }
//List<double> times = new List<double>();
//System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
public CAGraph (CA parent, ValueTuple<ushort, ushort, ushort> size, GridShape shape):base(CAEntityType.Graph)
{
this.Parent = parent;
this.Shape = shape;
AgentCells = new List<ValueTuple<ushort, ushort, ushort>>();
Updates = new List<ValueTuple<ushort, ushort, ushort>>();
Dimensions = new ValueTuple<ushort, ushort, ushort>(size.Item1, size.Item2, size.Item3);
Cells = new CAGraphCell[size.Item1, size.Item2, size.Item3];
for (ushort i = 0; i < Cells.GetLength(0); i++)
{
for (ushort j = 0; j < Cells.GetLength(1); j++)
{
for (ushort k = 0; k < Cells.GetLength(2); k++)
{
Cells[i, j, k] = new CAGraphCell(this, new ValueTuple<ushort, ushort, ushort>(i, j, k));
}
}
}
}
public CAGraph(CA parent, ValueTuple<ushort, ushort, ushort> size, GridShape shape, List<ValueTuple<ushort, ushort, ushort>> agents, CAGraphCell[,,] cells, Dictionary<string, dynamic> properties) : base(CAEntityType.Graph, properties)
{
Parent = parent;
Shape = shape;
AgentCells = agents.ConvertAll(x => new ValueTuple<ushort, ushort, ushort>(x.Item1, x.Item2, x.Item3));
Updates = new List<ValueTuple<ushort, ushort, ushort>>();
Dimensions = new ValueTuple<ushort, ushort, ushort>(size.Item1, size.Item2, size.Item3);
Cells = new CAGraphCell[size.Item1, size.Item2, size.Item3];
for (ushort i = 0; i < size.Item1; i++)
{
for (ushort j = 0; j < size.Item2; j++)
{
for (ushort k = 0; k < size.Item3; k++)
{
//if(i == 500 && j == 500)
//{
// Console.WriteLine();
//}
Cells[i, j, k] = cells[i, j, k].Copy(this);
}
}
}
}
}
public class CAGraphCell:CAEntity
{
public CAGraph Parent { get; set; }
public CAGraphCellAgent Agent { get; private set; }
public ValueTuple<ushort, ushort, ushort> Position { get; private set; }
//private Tuple<ushort, ushort, ushort>[][] Neighbors { get; set; }
//System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
public CAGraphCell(CAGraph parent, ValueTuple<ushort, ushort, ushort> position):base(CAEntityType.Cell)
{
this.Parent = parent;
this.Position = position;
//this.Neighbors = new Tuple<ushort, ushort, ushort>[Enum.GetNames(typeof(CANeighborhoodType)).Count()][];
}
public CAGraphCell(CAGraph parent, ValueTuple<ushort, ushort, ushort> position, Dictionary<string, dynamic> properties, CAGraphCellAgent agent) :base(CAEntityType.Cell, properties)
{
this.Parent = parent;
this.Position = position;
if(agent != null)
{
this.Agent = agent.Copy(this);
}
}
}
public class CAGraphCellAgent:CAEntity
{
// have to change...this has to be a property? Or no, it's a CAEntity which has a list of CAProperties.
//public int State { get; set; }
public CAGraphCell Parent { get; set; }
//System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
public CAGraphCellAgent(CAGraphCell parent, ushort state):base(CAEntityType.Agent)
{
this.Parent = parent;
AddProperty(("state", state));
}
public CAGraphCellAgent(CAGraphCell parent, Dictionary<string, dynamic> properties) :base(CAEntityType.Agent, properties)
{
this.Parent = parent;
}
}
It sounds like your problem is that your representation of your agents (using dictionaries) consumes too much memory. If so, the solution is to find a more compact representation.
Since you're working in an object-oriented language, the typical solution would be to define an Agent class, possibly with subclasses for different types of agents, and use instance variables to store the state of each agent. Then your CA grid will be an array of Agent instances (or possibly nulls for vacant cells). This will be a lot more compact than using dictionaries with string keys.
Also, I would recommend not storing the position of the agent on the grid as part of the agent's state, but passing it as a parameter to any methods that need it. Not only does this save a bit of memory just by itself, but it also allows you to place references to the same Agent instance at multiple cells on the grid to represent multiple identical agents. Depending on how often such identical agents occur in your CA, this may save a huge amount of memory.
Note that, if you modify the state of such a reused agent instance, the modification will obviously affect all agents on the grid represented by that instance. For that reason, it may be a good idea to make your Agent objects immutable and just create a new one whenever the agent's state changes.
You might also want to consider maintaining a cache (e.g. a set) of Agent instances already on the grid so that you can easily check if a new agent might be identical with an existing one. Whether this will actually do any good depends on your specific CA model — with some CA you might be able to handle de-duplication sufficiently well even without such a cache (it's perfectly OK to have some duplicate Agent objects), while for others there might simply not be enough identical agents to make it worthwhile. Also, if you do try this, note that you'll need to either design the cache to use weak references (which can be tricky to get right) or periodically purge and rebuild it to avoid old Agent objects lingering in the cache even after they've been removed from the grid.
Addendum based on your comment below, which I'll quote here:
Imagine an environment where the temperature varies seasonally (so a property on the graph). There are land and water cells (so properties on cells), and in low enough temperatures the water cells become frozen so animal agents can use them to cross over between land locations. Imagine those animal agents hunt other animal agents to eat them (so properties on the agents). Imagine the animal agents that get eaten eat trees (so other agents with properties), and tend to eat young saplings (limiting tree growth), thereby limiting their own growth (and that of the carnivore agents).
OK, so let's sketch out the classes you'd need. (Please excuse any syntax errors; I'm not really a C# programmer and I haven't actually tested this code. Just think of it as C#-like pseudocode or something.)
First of all, you will obviously need a bunch of agents. Let's define an abstract superclass for them:
public abstract class Agent {
public abstract void Act(Grid grid, int x, int y, float time);
}
Our CA simulation (which, for simplicity, I'm going to assume to be stochastic, i.e. one where the agents act one at a time in a random order, like in the Gillespie algorithm) is basically going to involve repeatedly picking a random cell (x, y) on the grid, checking if that cell contains an agent, and if so, calling Act() on that agent. (We'll also need to update any time-dependent global state while we're doing that, but let's leave that for later.)
The Act() methods for the agents will receive a reference to the grid object, and can call its methods to make changes to the state of nearby cells (or even get a reference to the agents in those cells and call their methods directly). This could involve e.g. removing another agent from the grid (because it just got eaten), adding a new agent (reproduction), changing the acting agent's location (movement) or even removing that agent from the grid (e.g. because it starved or died of old age). For illustration, let's sketch a few agent classes:
public class Sapling : Agent {
private static readonly double MATURATION_TIME = 10; // arbitrary time delay
private double birthTime; // could make this a float to save memory
public Sapling(double time) => birthTime = time;
public override void Act(Grid grid, int x, int y, double time) {
// if the sapling is old enough, it replaces itself with a tree
if (time >= birthTime + MATURATION_TIME) {
grid.SetAgentAt(x, y, Tree.INSTANCE);
}
}
}
public class Tree : Agent {
public static readonly Tree INSTANCE = new Tree();
public override void Act(Grid grid, int x, int y, double time) {
// trees create saplings in nearby land cells
(int x2, int y2) = grid.RandomNeighborOf(x, y);
if (grid.GetAgentAt(x2, y2) == null && grid.CellTypeAt(x2, y2) == CellType.Land) {
grid.SetAgentAt(x2, y2, new Sapling(time));
}
}
}
For the sake of brevity, I'll leave the implementation of the animal agents as an exercise. Also, the Tree and Sapling implementations above are kind of crude and could be improved in various ways, but they should at least illustrate the concept.
One thing worth noting is that, to minimize memory usage, the agent classes above have as little internal state as possible. In particular, the agents don't store their own location on the grid, but will receive it as arguments to the act() method. Since omitting the location in fact made my Tree class completely stateless, I went ahead and used the same global Tree instance for all trees on the grid! While this won't always be possible, when it is, it can save a lot of memory.
Now, what about the grid? A basic implementation (ignoring the different cell types for a moment) would look something like this:
public class Grid {
private readonly int width, height;
private readonly Agent?[,] agents;
public Grid(int w, int h) {
width = w;
height = h;
agents = new Agent?[w, h];
}
// TODO: handle grid edges
public Agent? GetAgentAt(int x, int y) => agents[x, y];
public void SetAgentAt(int x, int y, Agent? agent) => agents[x, y] = agent;
}
Now, what about the cell types? You have a couple of ways to handle those.
One way would be to make the grid store an array of Cell objects instead of agents, and have each cell store its state and (possibly) an agent. But for optimizing memory use it's probably better to just have a separate 2D array storing the cell types, something like this:
public enum CellType : byte { Land, Water, Ice }
public class Grid {
private readonly Random rng = new Random();
private readonly int width, height;
private readonly Agent?[,] agents;
private readonly CellType[,] cells; // TODO: init in constructor?
private float temperature = 20; // global temperature in Celsius
// ...
public CellType CellTypeAt(int x, int y) {
CellType type = cells[x,y];
if (type == CellType.Water && temperature < 0) return CellType.Ice;
else return type;
}
}
Note how the CellType enum is byte-based, which should keep the array storing them a bit more compact than if they were int-based.
Now, let's finally look at the main CA simulation loop. At its most basic, it could look like this:
Grid grid = new Grid(width, height);
grid.SetAgentAt(width / 2, height / 2, Tree.INSTANCE);
// average simulation time per loop iteration, assuming that each
// actor on the grid acts once per unit time step on average
double dt = 1 / (width * height);
for (double t = 0; t < maxTime; t += dt) {
(int x, int y) = grid.GetRandomLocation();
Agent? agent = grid.GetAgentAt(x, y);
if (agent != null) agent.Act(grid, x, y, t);
// TODO: update temperature here?
}
(Technically, to correctly implement the Gillespie algorithm, the simulation time increment between iterations should be an exponentially distributed random number with mean dt, not a constant increment. However, since only the actor on one of the width * height cells is chosen in each iteration, the number of iterations between actions by the same actor is geometrically distributed with mean width * height, and multiplying this by dt = 1 / (width * height) gives an excellent approximation for an exponential distribution with mean 1. Which is a long-winded way of saying that in practice using a constant time step is perfectly fine.)
Since this is getting long enough, I'll let you continue from here. I'll just note that there are plenty of ways to further expand and/or optimize the algorithm I've sketched above.
For example, you could speed up the simulation by maintaining a list of all grid locations that contain a live actor and randomly sampling actors from that list (but then you'd also need to scale the time step by the inverse of the length of the list). Also, you may decide that you want some actors to get more frequent chances to act than others; while the simple way to do that is just to use rejection sampling (i.e. have the actor only do something if rng.Sample() < prob for some prob between 0 and 1), a more efficient way would be to maintain multiple lists of locations depending on the type of the actor there.
Related
I have a sketch with 3D line segments. Each segment has start and end 3D point. My task is to merge segments if they are parallel and connected. I've implemented it on C#. This algorithm is recursive. Can this code be optimized? Can it be not recursive?
/// <summary>
/// Merges segments if they compose a straight line.
/// </summary>
/// <param name="segments">List of segments.</param>
/// <returns>Merged list of segments.</returns>
internal static List<Segment3d> MergeSegments(List<Segment3d> segments)
{
var result = new List<Segment3d>(segments);
for (var i = 0; i < result.Count - 1; i++)
{
var firstLine = result[i];
for (int j = i + 1; j < result.Count; j++)
{
var secondLine = result[j];
var startToStartConnected = firstLine.P1.Equals(secondLine.P1);
var startToEndConnected = firstLine.P1.Equals(secondLine.P2);
var endToStartConnected = firstLine.P2.Equals(secondLine.P1);
var endToEndConnected = firstLine.P2.Equals(secondLine.P2);
if (firstLine.IsParallel(secondLine) && (startToStartConnected || startToEndConnected || endToStartConnected || endToEndConnected))
{
Segment3d mergedLine = null;
if (startToStartConnected)
{
mergedLine = new Segment3d(firstLine.P2, secondLine.P2);
}
if (startToEndConnected)
{
mergedLine = new Segment3d(firstLine.P2, secondLine.P1);
}
if (endToStartConnected)
{
mergedLine = new Segment3d(firstLine.P1, secondLine.P2);
}
if (endToEndConnected)
{
mergedLine = new Segment3d(firstLine.P1, secondLine.P1);
}
// Remove duplicate.
if (firstLine == secondLine)
{
mergedLine = new Segment3d(firstLine.P1, firstLine.P2);
}
result.Remove(firstLine);
result.Remove(secondLine);
result.Add(mergedLine);
result = MergeSegments(result);
return result;
}
}
}
return result;
}
The classes Segment3D and Point3D are pretty simple:
Class Segment3D:
internal class Segment3d{
public Segment3d(Point3D p1, Point3D p2)
{
this.P1 = p1;
this.P2 = p2;
}
public bool IsParallel(Segment3d segment)
{
// check if segments are parallel
return true;
}
public Point3D P1 { get; }
public Point3D P2 { get; }
}
internal class Point3D
{
public double X { get; set; }
public double Y { get; set; }
public double Z { get; set; }
public override bool Equals(object obj)
{
// Implement equality logic,
return true;
}
}
The optimizations
You are asking about the way to remove the recursion. However that isn't the only nor the largest problem of your current solution. So I will try to give you an outline of possible directions for the optimization. Unfortunately, since your code still isn't self-contained minimal reproducible example it is rather tricky to debug. If I have time in the future, I might re-visit.
First step: Limiting the number of comparisons.
Currently, you are performing unnecessary number of comparisons since you compare every two possible line segments and every possible alignment that they can have. This is unnecessary.
First step to lower the number of the comparisons is to separate the line segments by their direction. Currently, when you are trying to compare two vectors, you go and check if these align. If that is the case, you proceed with the rest of the comparisons.
If we sort the segments by the direction, we will naturally group them into buckets of sort. Sorting the segments might sound weird, there are three axes to sort by. That is fine though since the only thing we actually care about is the fact, that if two normalized vectors (x,y,z) and (a,b,c) differ in at least one coordinate, they are not parallel.
This sorting can be done by implementing the IComparable interface and then calling sort method as you would normally. This has already been described for example here. The sorting has the complexity O(N * log(N)) which is quite a lot better than your current O(N^2) approach. (Actually, your algorithm is even worse than that since during each recursion step you do another N^2 steps. So you might be dangerously close to the O(N^4) territory. There be the dragons.)
Note: If you are feeling adventurous and O(Nlog(N) is not enough, it could be possible to get to the O(N) territory by implementing the System.HashCode function for your direction vector and find the groups of parallel lines by using Sets or Dictionaries. However it might be rather tricky since floating point numbers and comparison for equality aren't exactly a friendly combination. Caution is required. Sorting should be enough for most of the use cases.
So now we have the line segments separated by the direction vectors. Even if you proceed now with your current approach the results should be much better since we lowered the size of the groups to be compared. However we can go further.
Second step: Smart comparison of the segment ends.
Since you want to merge the segments if either of their endpoints align(as you covered start-start, start-end, end-start, end-end combinations in your question) we can most likely start merging any two identical points we find. The easiest way to do that is once more either sorting or hashing. Since we won't be normalizing which can result in the marginall differences, hashing should be viable. However sorting is still the more-straightforward and "safer" way to go.
One of the many ways this could be done:
Put all of the endpoints into single linked list as a tuple (endpoint, parent_vector). Arrays are not suitable since the deletion would be costly.
Sort the list by the endpoints. (Once again, you will need to implement the IComparable interface for your points)
Go through the list. Whenever the two neighboring points are equal, remove them and merge their siblings into a new line segment (Or delete one of them and the closer sibling if the neighbors are both the starting or ending point of the segment).
When you pass whole list, all the segments are either merged or have no more neighbors.
Pick up your survivors and you are done.
The complexity of this step is once again O(N * log(N)) as in the previous case. No recursion necessary and the speedup should be nice.
I have a large multi-dimensional array that needs to be stored with protobuf. The array could have up to 5120*5120 = 26,214,400 items in it. Protobuf does not support storing multi-dimensional arrays, unfortunately.
As a test, I wrote two functions and an extra class. The class stores and x,y which points to the location inside of the array (array[x, y]). The class has a "value" that is the data from the array[x,y]. I use a List to store this data.
When I generate a fairly small array (1024*1024) I get an output file that is over 169MB. From my testing, it loads and generates the file extremely fast so there's no issue there. However, the file size is huge - I definitely need to cut down on size.
Is this a normal file size, or do I to rethink my entire process? Should I compress the data before saving it (zipping the file takes it from 169MB to 6MB)? If so, what's the fastest/easiest way to zip a file in C#?
This is pseudo code that is based on my real code.
[ProtoContract]
public class Example
{
[ProtoIgnore]
public string[,] MyArray { get; set; }
[ProtoMember(0)]
private List<MultiArray> Storage { get; set; }
public void MoveToList()
{
for (int x = 0; x < MyArray.GetLength(0); x++)
{
for (int y = 0; y < MyArray.GetLength(1); y++)
{
Storage.Add(new MultiArray
{
_x = x,
_y = y,
value = MyArray[x, y]
});
}
}
}
public void MoveToArray()
{
MyArray = new string[1024, 1024];
for (int i = 0; i < Storage.Count; i++)
{
MyArray[Storage[i].X, Storage[i].Y] = Storage[i]._value;
}
}
}
[ProtoContract]
public class MultiArray
{
[ProtoMember(0)]
public int _y { get; set; }
[ProtoMember(1)]
public int _x { get; set; }
[ProtoMember(2)]
public string _value { get; set; }
}
Notes: The value must be the correct x/y of the array.
I appreciate any suggestions.
I don't know about the storage but this is probably not the right way to do it.
The way you are doing it, you are creating a MultiArray object for every cell of your array.
A simplier and more efficient solution would be to do that:
String[] Storage = new String[1024*1024];
int width = 1024
int height = 1024;
for (int x = 0; x < width; x++)
{
for (int y = 0; y < height; y++)
{
Storage[x*width+y]=MyArray[x,y];
}
}
Ultimately, the protobuf format doesn't have a concept of arrays of higher dimension than one.
At the library level since you're using protobuf-net we could have the library do some magic here, essentially treating it as;
message Array_T {
repeated int32 dimensions;
repeated T items; // packed when possible
}
(noting that .proto doesn't actually support generics, but that doesn't really matter at the library level)
However, this would be a little awkward from a x-plat perspective.
But to test whether this would help, you could linearize your 2D array, and see what space it takes.
In your case, I suspect the real problem (re the size) is the quantity of strings. Protobuf writes string contents every time, without any attempt at lookup tables. It may also be worth checking what the sunlm total of string lengths (in UTF-8 bytes) is for your array contents.
I have been trying to work with Unity's pure ECS approach to make a basic nxn grid of tiles.
Before using ECS I would just use a 2d array Tiles[,] grid where I can simply index (x,y) to find the tile I want.
Now moving into ECS, I want to create an entity for each tile using an IComponentData struct like so:
public struct Tile : IComponentData
{
public int xIndex;
public int yIndex;
public int isValid;
}
Somewhere during the start of my game I create the tile entities
for (int i = 0; i < tilesInMap; i++)
{
Entity tileEntity = entityManager.CreateEntity(tileArchetype);
int xIndex = i % terrainSize;
int yIndex = i / terrainSize;
entityManager.SetComponentData(tileEntity, new Position { Value = new float3(xIndex , 0, yIndex) });
Tile newTile;
newTile.xIndex = xIndex;
newTIle.yIndex = yIndex;
newTile.isValid = 1;
entityManager.SetComponentData(tileEntity, newTile);
}
Now somewhere else in my code I have a ComponentSystem that pulls in the tiles using a group struct
public struct TileGroup
{
public readonly int Length;
public EntityArray entity;
public ComponentDataArray<Tile> tile;
}
[Inject] TileGroup m_tileGroup;
As far as I'm aware, as long as I don't change the archetype of any of those entities in the TileGroup (for instance by calling AddComponent or RemoveComponent), the ordering of that injected array will be preserved (I'm not even sure of that!)
So say in my OnUpdate method I want to "get the tile at grid coordinate (23, 43)". I can calculate this (assuming a 1d array of tiles with the order preserved) by
int arrayIndex = yIndex * tilesPerMapWidth + xIndex;
But this only works as long as the injected array order of tiles is preserved which I doubt it will be eventually.
So a few questions:
1. Should I ever rely on the order of an injected array for behaviour logic?
2. Are there any better methods in achieving what I want using ECS?
From the Unity forums, "Order of Entities":
ComponentDataArray<> / ComponentGroup makes zero gurantees on the actual ordering of your entities except for that its deterministic. But generally the indices are not stable. (E.g. Adding a component to an entity, will change the index in ComponentDataArray of that entity and likely of another one) The only stable id is the "Entity" ID.
To answer your questions:
So a few questions: 1. Should I ever rely on the order of an injected array for behaviour logic?
Based on "...makes zero guarantees on the actual ordering of your entities..." I would say no.
Are there any better methods in achieving what I want using ECS?
You don't need arrayIndex to access entities in your tile group. You can iterate through the entities and access x and y for each to perform the required behavior. Depending on your actual functionality you may want to use the job system as well.
for (int i = 0; i < m_tileGroup.Length; i++) {
int x = m_tileGroup.tile[i].xIndex;
int y = m_tileGroup.tile[i].yIndex;
// do something with x and y
}
Related question: Getting a string index based on a pixel offset
I know this is close to that question, but this isn't asking how to do it directly, it's asking how to fake it best.
I am implementing my own text box for Windows Forms (because RichTextBox sucks) and I am trying to find the best way to, given strings that have been drawn on the screen, calculate what character the mouse is over. The problem is that characters can be variable-width.
I have come up with two possibilities:
Do Graphics.MeasureCharacterRange every time the mouse moves in a binary search fashion on the line that the mouse is over (as suggested in the question linked at the top)
Keep a list of the offset of every character of every line.
(1) Will have bad performance, and
(2) will be memory-inefficient plus make typing a character become a O(n) operation (because you have to adjust the offset of every character after it) plus impossible to do precisely because Graphics.MeasureCharacterRange isn't precise (it returns one value for one character, another value for another character, and a totally different value [that does not equal the two previous values added together] for both of them together in one string. E.g. W will be 16 pixels wide and f will be 5 pixels wide, but Wf is 20 pixels wide. Those numbers are from an actual test.).
So I am looking for a better strategy to do this, preferably one that requires minimal space and O(1) computational complexity (though I will gladly trade off a little memory efficiency for speed efficiency).
I don't think you have to do O(1). O(1) is assuming that every additional character has an affect on ALL previous characters, which it would not. At best I would see an O(1) for each word which should be crazy fast. It sounds like what you need is a way to store; 1 the location of each word, 2 each unique word, and 3 the width of each letter in the word. This would significantly reduce the storage and increase look up speed. Maybe something like:
IEnumerable<TextLocation> TextLocations = ...;
internal class TextLocation
{
public RectF BoundingBox { get; set; } //this is relative to the textbox
public TextWord TextWord { get; set; }
}
internal class TextWord
{
public string Text { get; set; }
public IEnumerable<LetterInfo> Letters { get; set; }
}
internal class LetterInfo
{
public char Letter { get; set; }
public float left { get; set; } //these would be relative to the bounding box
public float right { get; set; } //not to the textbox
}
Then you might be able to do something like
var tl = TextLocations.FirstOrDefault(x => x.BoundingBox.Left < Mouse.X
&& x.BoundingBox.Right > Mouse.X
&& x.BoundingBox.Top < Mouse.Y
&& x.BoundingBox.Bottom > Mouse.Y)
if (tl != null)
{
//tl.TextWord.Text is the Word ("The", "Lazy", "Dog"...)
var letter = tl.TextWord.Letters
.FirstOrDefault(x => Mouse.x - tl.BoundingBox.left > x.left
Mouse.x - tl.BoundingBox.left < x.right);
if (letter != null)
{
// you get the idea
}
}
I'm sifting through some of my old bugs and while reviewing some nasty code I realized that my averaging or smoothing algorithm was pretty bad. I did a little research which led me to the "running mean" - makes sense, pretty straightforward. I was thinking through a possible implementation and realized that I don't know which collection would provide the type of "sliding" functionality that I need. In other words, I need to push/add an item to the end of the collection and then also pop/remove the first item from the collection. I think if I knew what this was called I could find the correct collection but I don't know what to search for.
Ideally a collection where you set the max size and anything added to it that exceeds that size would pop off the first item.
To illustrate, here is what I came up with while messing around:
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
LinkedList<int> samples = new LinkedList<int>();
// Simulate packing the front of the samples, this would most like be a pre-averaged
// value from the raw samples
for (int i = 0; i < 10; i++)
{
samples.AddLast(0);
}
for (int i = 0; i < 100; i++)
{
// My attempt at a "sliding collection" - not really sure what to call it but as
// an item is added the first item is removed
samples.RemoveFirst();
samples.AddLast(i);
foreach (int v in samples)
{
Console.Write("{0:000} ", v);
}
Console.WriteLine(String.Empty);
}
Console.ReadLine();
}
}
}
As you can see I am manually handling the removal of the first item. I'm just asking if there is a standard collection that is optimized for this type of use?
It appears that you're looking for a Circular Buffer. Here's a .NET implementation on CodePlex. You may also want to look at this question: How would you code an efficient Circular Buffer in Java or C#?
From the sample you've provided, it isn't clear how exactly this relates to an online-mean algorithm. If the only operation allowed on the buffer is to append; it should be trivial to cache and update the "total" inside the buffer (add the new value, subtract the removed one); making the maintaining of the mean an O(1) operation for every append. In this case, the buffer is effectively holding the Simple Moving Average (SMA) of a series.
Have you had a look at Queue Class
Does a List satisfy your needs?
List<String> myList = new List<String>();
myList.Add("Something to the end");
myList.RemoveAt(0);
#Ani - I'm creating a new Answer instead of comment because I have some code to paste. I took a swing at a dead simple object to assist with my running mean and came up with the following:
class RollingMean
{
int _pos;
int _count;
double[] _buffer;
public RollingMean(int size)
{
_buffer = new double[size];
_pos = 0;
_count = 0;
}
public RollingMean(int size, double initialValue)
: this(size)
{
// Believe it or not there doesn't seem to be a better(performance) way...
for (int i = 0; i < size; i++)
{
_buffer[i] = initialValue;
}
_count = size;
}
public double Push(double value)
{
_buffer[_pos] = value;
_pos = (++_pos > _buffer.Length - 1) ? 0 : _pos;
_count = Math.Min(++_count, _buffer.Length);
return Mean;
}
public double Mean
{
get
{
return _buffer.Sum() / _count;
}
}
}
I'm reading 16 channels of data from a data acquisition system so I will just instantiate one of these for each channel and I think it will be cleaner than managing a multi-dimensional array or separate set of buffer/postition for each channel.
Here is sample usage for anyone interested:
static void Main(string[] args)
{
RollingMean mean = new RollingMean(10, 7);
mean.Push(3);
mean.Push(4);
mean.Push(5);
mean.Push(6);
mean.Push(7.125);
Console.WriteLine( mean.Mean );
Console.ReadLine();
}
I was going to make the RollingMean object a generic rather than lock into double but I couldn't find a generic constraint to limit the tpye numerical types. I moved on, gotta get back to work. Thanks for you help.