The XCRF System
The XCRF System is a JAVA library for labeling XML data. The current
version of the system gives the ability to label:
- Element Nodes
- Text Nodes
- Attribute Nodes
An XCRF is defined as an XML file. It allows the definition of:
- The set of labels
- Several subsets of labels (which can prove useful when defining
features functions)
- The feature functions. Feature functions are 0-1 valued
functions. They are defined by tests on the labels, and tests on the
observable XML input document. These tests are defined using XPATH
expressions.
Parameter estimation is performed by maximizing the penalized
log-likelihood. The value of the penalization parameter sigma can be
modified in the file config.xml. The L-BFGS gradient ascent package from the RISO
Project is used.
The XCRF package can be downloaded here. Examples of how to use
this package are available in the archive. A full documentation is on
its way.
The XCRF project is partially supported by european funds FEDER "projet COCOA" and ANR projects
MARMOTA and ATASH.
|