Child pages
  • Performance and quality monitoring
Skip to end of metadata
Go to start of metadata
Icon

Specification incomplete, not ready for implementation!

Goals

Ability to automatically produce:

  • simple clustering performance and scalability reports (M1)
  • simple clustering quality reports (M2)
  • easy integration with build tools (M1)

Such reports would let us catch performance and quality problems early on. Additionally, they could be part of the release artifacts to give the users some insight into the algoritnms' capabilities.

Specification

The benchmarking tool is best implemented as a command line application with the following features.

Performance and scalability measurements

  • Clustering time measurement (M1)
  • Best-effort memory footprint measurement (M1)
  • JVM warm-up, customizable number of warm-up rounds (M1)
  • Benchmark runs
  • Max benchmark time per run, useful for larger data sets

Clustering quality measurements

With all the known caveats, two groups of metrics could be measured:

  • Cluster purity, topic coverage, document coverage (M2)
  • Some of the standard measures (M2)

Storage of results

  • HSQLDB (M1). If no database directory is provided, current working dir should be used.
  • Each result row should contain:
    • Algorithm id
    • Attribute set id
    • Run key: unique identification of the whole invocation of the benchmarking tool. Run key can be provided externally (e.g. release number, build number, svn revision). If no run key is provided, a unique one must be generated (e.g. from the current date and time or a simple sequential number (initialized from db max)). If entries for the generated/provided key already exist, an error should be reported. If an appropriate command line switch is provided, existing rows should be deleted. (M1)
    • Number of documents
    • Total size of input
    • Run time: date and time the benchmarking tool was started (same for all runs) (M1)
    • Used memory
    • Number of GC calls
  • Pruning of old results: removing results older than a specified date, useful when the tool is run in a build (M1)
  • JVM, machine and OS specification
  • Export of results to XML (M2)

Presentation of results

A fixed set of charts of two types:

Single-run charts

  • time = f(docs) across all algorithms in default settings
  • time = f(docs) across all attribute sets of an algorithm
  • mem = f(docs) across all algorithms in default settings
  • mem = f(docs) across all attribute sets of an algorithm
  • time = f(size) across all algorithms in default settings
  • time = f(size) across all attribute sets of an algorithm
  • mem = f(size) across all algorithms in default settings
  • mem = f(size) across all attribute sets of an algorithm

Historical charts

  • time = f(run-key) across 4 input sizes for each algorithm in default settings
  • memory = f(run-key) across 4 input sizes for each algorithm in default settings
  • gc = f(run-key) across 4 input sizes for each algorithm in default settings

Implementation notes

Questions

  • Should/can we distribute the test data together with the tool?

Future ideas

  • multiple document collections (short, medium, long documents)
  • No labels