Monday, November 21, 2011

Primary Key Variables and Rule Activations in Pachinko

While Pachinko is a Rete-inspired rule engine, it contains features that make it suitable for stream-oriented event processing. One such feature is its use of Primary Key variables.

In the simplest form of rule engines, a rule fires any time a fact the rule depends on is modified. This is how Pachinko operates if all the rule parameters used are Variables.

However, in a stream processing environment, it's not always desirable to have a rule fire any time any one of its parameters changes. For example, a rule which compares the share price of a seldom traded stock against the current Dow Jones average would require significant compute resources if it recalculated any time Either the share price OR the Dow changed, since even though the share price of the stock might only change a couple of times a day, the Dow is fluctuating continuously.

In that case, what's desired is a way to tell the engine to only recalculate when an IMPORTANT value (the share price) changes, and to use the most recent unimportant value (the Dow) but don't bother recalculating every time the unimportant value changes.

To make Pachinko do this, use PKVariables for the important values and Variables for the unimportant values. The rule won't fire at all until all variables have received values, but thereafter, the rule will only recalculate when one of the PKVariables changes value.

To get the latest version of Pachinko incorporating this feature and sample JUnit tests exercising it, go here.

Thursday, November 17, 2011

PACHINKO high-speed embeddable rule engine

I just released the first version of PACHINKO into the wild. You can find it up on GitHub at:

Get Pachinko

Pachinko is a small-footprint Rete-inspired rule engine runtime designed to be embedded in your java code. It accepts rules written in Java and executes them in a very fast, single-threaded fashion. It is lightweight enough that for multicore applications, a separate instance of it can be executed per thread and rules can access shared state.

Performance is currently (on a Macbook Core2i7) sub-200 nanoseconds for a single rule evaluation, and about 9 microseconds for evaluation of a single PrimaryKey rule in a corpus of 1000 rules on the same variable. (This latter performance number should improve considerably in the next version.)

Some of its interesting features include:

- It is built on top of the ROUX monadic function library, which is in large part why it is so fast. Using this library, there is no copying of values between alpha and beta memories, and no hash lookups.

- Persistence of the rule state can be easily accommodated by persisting the changed monads between rule invocations.

- Rules can be written in plain java code by subclassing DefaultCARule, or by dynamically assembling a monadic expression which is passed to the rule, allowing for light-weight UI-driven rule specification at runtime.

- Rules can be added or changed while the engine is running.

- Because it uses a Rete-inspired dependency graph for feeding state to the free variables of rules, considerable economies of both scaling and evaluation can be achieved. Rule conditions are not evaluated unless all free variables have current state present.

- Rule activation can be controlled in a manner amenable to performant stream processing. By default rules activate when all their free variables are bound to a current value. However, if one or more variables are defined as PKVariables, they are used as keys. A rule with one or more PK variables will not activate until all of its free variables are bound to a current value, however it will be re-activated whenever one of PKVariables changes value. Changes in value to ordinary variables have no effect on activation.

- For the common case where a number of rules are defined on a single fact, but each is expected to fire only when the fact attains a certain value, there is an optimization available. By defining the variable as a PKVariable and providing an ActivationValue for it, the rule condition will not be evaluated unless the value of the PKVariable is equal to its ActivationValue.

Here is a simple example of a Pachinko java rule:

public class StartEventRule extends DefaultCARule {
  int _event = -1;
  int _status = -1;

  public StartEventRule() {
    super();
    _event = addPkVariable(new PKVariable("EVENT", "StartEvent"), "StartEvent");
    _status = addOptionalVariable(new Variable("STATUS", "NOT_STARTED"));
  }

  @Override
  public boolean evaluateCondition(IMonadex context) {
    return context.bindValue(_event).equals("StartEvent");
  }

  @Override
  public void doAction(IReadWriteMonadex context) {
    context.returnValue(_status, "STARTED");
  }
}

And an example of the above rule in use:

public void simpleStartEventTest() {
  // Initialize rule system with its set of rules:
  CARuleSystem ruleSystem = new CARuleSystem(new StartEventRule());

  // Set some data into the rule system and process any resulting activations...
  IReadWriteMonadex readWriteContext = ruleSystem.freeVariables();
  readWriteContext.returnValue("EVENT", "IdleEvent");
  ruleSystem.executeActivations();

  // Verify that the rule system did not change state...
  assertEqual("NOT_STARTED", readWriteContext.getMonad("STATUS").bindValue(readWriteContext));

  // Now do it again, only with the expected value for EVENT...
  readWriteContext.returnValue("EVENT", "StartEvent");
  ruleSystem.executeActivations();

  // Verify that the rule system did change state this time:
  assertEqual("STARTED", readWriteContext.getMonad("STATUS").bindValue(readWriteContext));
}

Thursday, August 20, 2009

Monday, September 29, 2008

Relationships as the Semantic Heart of a Language

I've been thinking a lot about intentional programming and the idea that behavior doesn't simply belong to an entity but to the collaboration of two entities and the surrounding context in which they find themselves.

It's an old pattern in database and data modeling to represent rich relationships between two entities with three entities rather than two, where the third entity contains the new information created by the relationship between the first two.

Dependency Injection is a mechanism for permitting this same basic pattern for OO programming languages, where the third entity is really the larger enclosing context in which both of the primary entities exist.

I'm chewing on an intersection of two interesting ideas that elaborate on this concept.

The first idea was inspired by reading maverick physicist Julian Barbour's book "The End of Time: The Next Generation in Physics." One of Barbour's pet conceits is that there is no causality. That all moments in time exist simultaneously as bubbles in a higher order dimension and that causality is really the apparent ordering which is evident when you look in any direction across the sea of bubbles.

This notion got me thinking about entities being nested inside of other entities and the whole notion of hierarchical scope being turtles all the way down. Infinite hierarchical inclusion is a comfortingly attractive idea but in practice it proves a stumbling block when trying to cope with cross-tree relationships. Instead of having a fundamental notion of all entities being defined somewhere in a single-footed hierarchy which owns them, what if we instead said that all entities are their own single-node trees, living in their own world and that their presence in any other larger "enclosing" context was by relationship.

That gives us a model where container->contained relationships are not special cases. They are the normal case. And A->B relationships might be modeled as A->container->B. This is vaguely less efficient than simple A->B, but vastly more powerful because concepts like friction, physics and indirect coupling can now be modeled explicitly.

It also meshes nicely with my second thought, which is that instead of an OOP-driven point of view where behaviors belong to entities, it seems more rational to think of behaviors as belonging to relationships. So that what is possible for entity A to do or have done to it within context C, is a function of the relationship between A and C. Likewise, what A can do to B within enclosing context C is a function of the relationship of A->C and of C->B.

I'll post the math for this when I get further along in working it out. It seems to me though that the confluence of these two simple thoughts is a powerful basis for a language that permits entities to be used differently in different contexts with higher levels of reuse and repurposing than is found in other systems.

Friday, August 01, 2008

Embedded Equation Test

Well, LaTex equations work now, thanks to a 3rd party equation rendering javascript package. MathML is still problematic though.

A sample LaTex equation:

\int_{0}^{1}\frac{x^{4}\left(1-x\right)^{4}}{1+x^{2}}dx
=\frac{22}{7}-\pi

The Sleeper Awakes

I know I've always said that this blog would be a sporadic affair, updated as and when I had a chance to come up for air. However two and a half years is stretching the point a bit.

I have a number of things I've been working on that I'll post the details on as I get a chance, including some ideas for a possibly novel approach to a computer language but I thought I would celebrate the re-inauguration of this blog with the invention of a new term.

I've coined a new sniglet hodgepourri to describe the analytical approach many new web languages take to choosing the features of the language.

This word can be used in a sentence:

Perl contains a hodgepourri of features.

My apologies, Larry Wall but you know it's true.

Thursday, December 01, 2005

Mathematical Model for Database Dimension Systems

This is a paper I wrote several years ago while I was working at Certive but which is still interesting in general terms to people interested in a pragmatic definition of discrete and continuous dimensions as a basis for building analytical software systems.

I include it as a link because the number of equations included as .gifs has defeated the current incarnation of Blogger's nifty picture-post mechanism.

Tuesday, September 27, 2005

Brands are Related to Patterns

Thanks, to Scott Wiener, here's an old (1991) paper by Richard Gabriel that discusses parallelism in QLisp and a construct called brands. Brands share a lot of philosophical similarity to the way I am seeking to implement support for patterns in Maverick.

If you've never read it, it's quite interesting.