Ron DuPlain, of the National Radio Astronomy Observatory (NRAO), has used PyCLIPS as a proof of concept on how to integrate a rule based inference engine with existing software to ensure the quality of data produced by the GBT (Green Bank Telescope).

In a really explanatory presentation, available here, Ron shows how it's possible to build a complex system that ensures that the data produced by the GBT meets the expected standards and reflects the data consumer expectations also by analysing the raw data by means of an expert system. Ron presented his work at ADASS last year, and kindly allowed me to mention it here.

As per the presentation, the process follows these principles:

  • Separate maintenance of rules from the traditional software components.
  • Provide a unified approach to data quality, whether online or offline.
  • Allow use of existing Python functions and libraries for testing data.
  • Isolate GBT-specific software components.

PyCLIPS takes place in this project both to encapsulate and drive the CLIPS engine and to provide external Python functions that are used for integration with existing software. Also, it allows the developers to focus on the different aspects of data analysis separately - the rules are independent from the procedural aspects, rules can be added or removed without harming the ruleset and so on.

The presentation is rather interesting and clear, even from a non-astronomical point of view... Most of Ron's considerations about data quality are valid in many aspects of data mining.