The XCRF System is a JAVA library for labeling XML data. The current version of the system gives the ability to label:

  • Element Nodes
  • Text Nodes
  • Attribute Nodes

An XCRF is defined as an XML file. It allows the definition of:

  • The set of labels
  • Several subsets of labels (which can prove useful when defining features functions)
  • The feature functions. Feature functions are 0-1 valued functions. They are defined by tests on the labels, and tests on the observable XML input document. These tests are defined using XPATH expressions.

Parameter estimation is performed by maximizing the penalized log-likelihood. The value of the penalization parameter sigma can be modified in the file config.xml. The L-BFGS gradient ascent package from the RISO Project is used.

The XCRF package can be downloaded here. Examples of how to use this package are available in the archive. A full documentation is on its way.

The XCRF project is partially supported by european funds FEDER "projet COCOA" and ANR projects MARMOTA and ATASH.



