The document is a manual for users of uimaFIT, a friendly API to the Apache UIMA framework.
Copyright © 2023 The Apache Software Foundation
License and Disclaimer
The ASF licenses this documentation to you under the Apache License, Version 2.0 (the "License"); you may not use this documentation except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, this documentation and its contents are distributed under the License on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Trademarks
All terms mentioned in the text that are known to be trademarks or service marks have been appropriately capitalized. Use of such terms in this book should not be regarded as affecting the validity of the the trademark or service mark.
1. Introduction
While uimaFIT provides many features for a UIMA developer, there are two overarching themes that most features fall under. These two sides of uimaFIT are,while complementary, largely independent of each other. One of the beauties of uimaFIT is that a developer that uses one side of uimaFIT extensively is not required to use the other side at all.
1.1. Simplify Component Implementation
The first broad theme of uimaFIT provides features that simplify component
implementation.
Our favorite example of this is the @ConfigurationParameter
annotation which allows you to annotate a member variable as a configuration parameter.
This annotation in combination with the method ConfigurationParameterInitializer.initialize()
completely automates the process of initializing member variables with values from the UimaContext
passed into your analysis engine’s initialize method.
Similarly, the annotation @ExternalResource
annotation in combination with the method ExternalResourceInitializer.initialize()
completely automates the binding of an external resource as defined in the UimaContext
to a member variable.
Dispensing with manually writing the code that performs these two tasks reduces effort, eliminates verbose and potentially buggy boiler-plate code, and makes implementing a UIMA component more enjoyable.
Consider, for example, a member variable that is of type Locale
.
With uimaFIT you can simply annotate the member variable with @ConfigurationParameter
and have your initialize method automatically initialize the variable correctly with a string value in the UimaContext
such as en_US
.
1.2. Simplify Component Instantiation
The second broad theme of uimaFIT provides features that simplify component instantiation. Working with UIMA, have you ever said to yourself “but I just want to tag some text!?” What does it take to “just tag some text?” Here’s a list of things you must do with the traditional approach:
-
wrap your tagger as a UIMA analysis engine
-
write a descriptor file for your analysis engine
-
write a CAS consumer that produces the desired output
-
write another descriptor file for the CAS consumer
-
write a descriptor file for a collection reader
-
write a descriptor file that describes a pipeline
-
invoke the Collection Processing Manager with your pipeline descriptor file
1.2.1. From a class
Each of these steps has its own pitfalls and can be rather time consuming. This is a rather unsatisfying answer to our simple desire to just tag some text. With uimaFIT you can literally eliminate all of these steps.
Here’s a simple snippet of Java code that illustrates “tagging some text” with uimaFIT:
import static org.apache.uima.fit.factory.JCasFactory.createJCas;
import static org.apache.uima.fit.pipeline.SimplePipeline.runPipeline;
import static
org.apache.uima.fit.factory.AnalysisEngineFactory.createEngineDescription;
JCas jCas = createJCas();
jCas.setDocumentText("some text");
runPipeline(jCas,
createEngineDescription(MyTokenizer.class),
createEngineDescription(MyTagger.class));
for (Token token : iterate(jCas, Token.class)){
System.out.println(token.getTag());
}
This code uses several static method imports for brevity. And while the terseness of this code won’t make a Python programmer blush - it is certainly much easier than the seven steps outlined above!
1.2.2. From an XML descriptor
uimaFIT provides mechanisms to instantiate and run UIMA components programmatically with or without descriptor files.
For example, if you have a descriptor file for your analysis engine defined by MyTagger
(as shown above), then you can instead instantiate the analysis engine with:
AnalysisEngineDescription tagger = createEngineDescription(
"mypackage.MyTagger");
This will find the descriptor file mypackage/MyTagger.xml by name.
Similarly, you can find a descriptor file by location with createEngineDescriptionFromPath()
.
However, if you want to dispense with XML descriptor files altogether (and you probably do), you can use the method createEngineDescription()
as shown above.
One of the driving motivations for creating the second side of uimaFIT is our frustration with descriptor files and our desire to eliminate them.
Descriptor files are difficult to maintain because they are generally tightly coupled with java code, they decay without warning, they are wearisome to test, and they proliferate, among other reasons.
1.3. Is this cheating?
One question that is often raised by new uimaFIT users is whether or not it breaks the UIMA way.
That is, does adopting uimaFIT lead me down a path of creating UIMA components and systems that are incompatible with the traditional UIMA approach? The answer to this question is no.
For starters, uimaFIT does not skirt the UIMA mechanism of describing components - it only skips the XML part of it.
For example, when the method createEngineDescription()
is called (as shown above) an AnalysisEngineDescription
is created for the analysis engine.
This is the same object type that is instantiated when a descriptor file is used.
So, instead of parsing XML to instantiate an analysis engine description from XML, uimaFIT uses a factory method to instantiate it from method parameters.
One of the happy benefits of this approach is that for a given AnalysisEnginedDescription
you can generate an XML descriptor file using AnalysisEngineDescription.toXML()
.
So, uimaFIT actually provides a very simple and direct path for generating XML descriptor files rather than manually creating and maintaining them!
It is also useful to clarify that if you only want to use one side or the other of uimaFIT, then you are free to do so.
This is possible precisely because uimaFIT does not workaround UIMA’s mechanisms for describing components but rather uses them directly.
For example, if the only thing you want to use in uimaFIT is the @ConfigurationParameter
, then you can do so without worrying about what effect this will have on your descriptor files.
This is because your analysis engine will be initialized with exactly the same UimaContext
regardless of whether you instantiate your analysis engine in the UIMA way or use one of uimaFIT’s factory methods.
Similarly, a UIMA component does not need to be annotated with @ConfiguratioParameter
for you to make use of the createEngineDescription()
method.
This is because when you pass configuration parameter values in to the createEngineDescription()
method, they are added to an AnalysisEngineDescription
which is used by UIMA to populate a UimaContext
- just as it would if you used a descriptor file.
1.4. Conclusion
Because uimaFIT can be used to simplify component implementation and instantiation it is easy to assume that you can’t do one without the other. This page has demonstrated that while these two sides of uimaFIT complement each other, they are not coupled together and each can be effectively used without the other. Similarly, by understanding how uimaFIT uses the UIMA component description mechanisms directly, one can be assured that uimaFIT enables UIMA development that is compatible and consistent with the UIMA standard and APIs.
2. Getting Started
This quick start tutorial demonstrates how to use uimaFIT to define and set a configuration parameter in an analysis engine, run it, and generate a descriptor file for it. The complete code for this example can be found in the uimaFIT-examples module.
2.1. Adding uimaFIT to your project
The following instructions describe how to add uimaFIT to your project’s classpath.
2.1.1. Maven users
If you use Maven, then uimaFIT can be added to your project by simply adding uimaFIT as a project dependency by adding the following snippet of XML to your pom.xml file:
<dependency>
<groupId>org.apache.uima</groupId>
<artifactId>uimafit-core</artifactId>
<version>3.6.0-SNAPSHOT</version>
</dependency>
uimaFIT distributions are hosted by Maven Central and so no repository needs to be added to your pom.xml file.
2.1.2. Non-Maven users
If you do not build with Maven, then download uimaFIT from the Apache UIMA downloads page. The file name should be uimafit—bin.zip. Download and unpack this file. The contents of the resulting upacked directory will contain a directory called lib. Add all of the files in this directory to your classpath.
2.2. A simple analysis engine implementation
Here is the complete analysis engine implementation for this example.
public class GetStartedQuickAE
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase {
public static final String PARAM_STRING = "stringParam";
@ConfigurationParameter(name = PARAM_STRING)
private String stringParam;
@Override
public void process(JCas jCas) throws AnalysisEngineProcessException {
System.out.println("Hello world! Say 'hi' to " + stringParam);
}
}
The first thing to note is that the member variable stringParam
is annotated with @ConfigurationParameter
which tells uimaFIT that this is an analysis engine configuration parameter.
It is best practice to create a public constant for the parameter name, here PARAM_STRING
The second thing to note is that we extend uimaFIT’s version of the JCasAnnotator_ImplBase
.
The initialize method of this super class calls:
ConfigurationParameterInitializer.initializeConfigurationParameters(
Object, UimaContext)
which populates the configuration parameters with the appropriate contents of the UimaContext
.
If you do not want to extend uimaFIT’s JCasAnnotator_ImplBase
, then you can call this method directly in the initialize
method of your analysis engine or any class that implements Initializable
.
You can call this method for an instance of any class that has configuration parameters.
2.3. Running the analysis engine
The following lines of code demonstrate how to instantiate and run the analysis engine from a main method:
JCas jCas = JCasFactory.createJCas();
AnalysisEngine analysisEngine = AnalysisEngineFactory.createEngine(
GetStartedQuickAE.class,
GetStartedQuickAE.PARAM_STRING, "uimaFIT");
analysisEngine.process(jCas);
In a more involved example, we would probably instantiate a collection reader and run this analysis engine over a collection of documents.
Here, it suffices to simply create a JCas
.
Line 3 instantiates the analysis engine using AnalysisEngineFactory
and sets the string parameter named stringParam
to the value uimaFIT
.
Running this simple program sends the following output to the console:
Hello world! Say 'hi' to uimaFIT
Normally you would be using a type system with your analysis components. When using uimaFIT, it is easiest to keep your type system descriptors in your source folders and make them known to uimaFIT. To do so, create a file META-INF/org.apache.uima.fit/types.txt in a source folder and add references to all your type descriptors to the file, one per line. You can also use wildcards. For example:
classpath*:org/apache/uima/fit/examples/type/Token.xml
classpath*:org/apache/uima/fit/examples/type/Sentence.xml
classpath*:org/apache/uima/fit/examples/tutorial/type/*.xml
2.4. Generate a descriptor file
The following lines of code demonstrate how a descriptor file can be generated using the class definition:
AnalysisEngine analysisEngine = AnalysisEngineFactory.createEngine(
GetStartedQuickAE.class,
GetStartedQuickAE.PARAM_STRING, "uimaFIT");
analysisEngineDescription.toXML(
new FileOutputStream("GetStartedQuickAE.xml"));
If you open the resulting descriptor file you will see that the configuration parameter stringParam
is defined with the value set to uimaFIT
.
We could now instantiate an analysis engine using this descriptor file with a line of code like this:
AnalysisEngineFactory.createEngine("GetStartedQuickAE");
But, of course, we really wouldn’t want to do that now that we can instantiate analysis engines using the class definition as was done above!
This chapter, of course, did not demonstrate every feature of uimaFIT which provides support for annotating external resources, creating aggregate engines, running pipelines, testing components, among others.
3. Pipelines
UIMA is a component-based architecture that allows composing various processing components into a complex processing pipeline. A pipeline typically involves a collection reader which ingests documents and analysis engines that do the actual processing.
Normally, you would run a pipeline using a UIMA Collection Processing Engine or using UIMA AS. uimaFIT offers a third alternative that is much simpler to use and well suited for embedding UIMA pipelines into applications or for writing tests.
As uimaFIT does not supply any readers or processing components, we just assume that we have written three components:
-
TextReader
- reads text files from a directory -
Tokenizer
- annotates tokens -
TokenFrequencyWriter
- writes a list of tokens and their frequency to a file
We create descriptors for all components and run them as a pipeline:
CollectionReaderDescription reader =
CollectionReaderFactory.createReaderDescription(
TextReader.class,
TextReader.PARAM_INPUT, "/home/uimafit/documents");
AnalysisEngineDescription tokenizer =
AnalysisEngineFactory.createEngineDescription(
Tokenizer.class);
AnalysisEngineDescription tokenFrequencyWriter =
AnalysisEngineFactory.createEngineDescription(
TokenFrequencyWriter.class,
TokenFrequencyWriter.PARAM_OUTPUT, "counts.txt");
SimplePipeline.runPipeline(reader, tokenizer, writer);
Instead of running the full pipeline end-to-end, we can also process one document at a time and inspect the analysis results:
CollectionReaderDescription reader =
CollectionReaderFactory.createReaderDescription(
TextReader.class,
TextReader.PARAM_INPUT, "/home/uimafit/documents");
AnalysisEngineDescription tokenizer =
AnalysisEngineFactory.createEngineDescription(
Tokenizer.class);
for (JCas jcas : SimplePipeline.iteratePipeline(reader, tokenizer)) {
System.out.printf("Found %d tokens%n",
JCasUtil.select(jcas, Token.class).size());
}
4. Testing UIMA components
Writing tests without uimaFIT can be a laborious process that results in fragile tests that are very verbose and break easily when code is refactored. This page demonstrates how you can write tests that are both concise and robust. Here is an outline of how you might create a test for a UIMA component without uimaFIT:
-
write a descriptor file that configures your component appropriately for the test. This requires a minimum of 30-50 lines of XML.
-
begin a test with 5-10 lines of code that instantiate the e.g. analysis engine.
-
run the analysis engine against some text and test the contents of the CAS.
-
repeat steps 1-3 for your next test usually by copying the descriptor file, renaming it, and changing e.g. configuration parameters.
If you have gone through the pain of creating tests like these and then decided you should refactor your code, then you know how tedious it is to maintain them.
Instead of pasting variants of the setup code (see step 2) into other tests we began to create a library of utility methods that we could call which helped shorten our code. We extended these methods so that we could instantiate our components directly without a descriptor file. These utility methods became the initial core of uimaFIT.
4.1. Examples
There are several examples that can be found in the uimafit-examples module.
-
There are a number of examples of unit tests in both the test suite for the uimafit-core module and the uimafit-examples module. In particular, there are some well-documented unit tests in the latter which can be found in
RoomNumberAnnotator1Test
. -
You can improve your testing strategy by introducing a
TestBase
class such as the one found inExamplesTestBase
. This class is intended as a super class for your other test classes and sets up aJCas
that is always ready to use along with aTypeSystemDescription
and aTypePriorities
. An example test that subclasses fromExamplesTestBase
isRoomNumberAnnotator2Test
. -
Most analysis engines that you want to test will generally be downstream of many other components that add annotations to the CAS. These annotations will likely need to be in the CAS so that a downstream analysis engine will do something sensible. This poses a problem for tests because it may be undesirable to set up and run an entire pipeline every time you want to test a downstream analysis engine. Furthermore, such tests can become fragile in the face of behavior changes to upstream components. For this reason, it can be advantageous to serialize a CAS as an XMI file and use this as a starting point rather than running an entire pipeline. An example of this approach can be found in
XmiTest
.
4.2. Tips & Tricks
The package <package>org.apache.uima.fit.testing</package> provides some utility classes that can be handy when writing tests for UIMA components. You may find the following suggestions useful:
-
add a
TokenBuilder
to yourTestBase
class. An example of this can be found inComponentTestBase
. This makes it easy to add tokens and sentences to the CAS you are testing which is a common task for many tests. -
use a
JCasBuilder
to add text and annotations incrementally to a JCas instead of first setting the text and then adding all annotations. -
use a
CasDumpWriter
to write the CAS contents is a human readable format to a file or to the console. Compare this with a previously written and manually verified file to see if changes in the component result in changes of the components output.
5. Validating CASes
The uimaFIT CAS validation feature allows you to define consistency rules for your type system and to automatically check that CASes comply with these rules.
5.1. Example use case
Imagine a system which uses machine learning to automatically identify persons in a text. Such a
system might define an annotation type called Person
having a feature called confidence
of type
float
. However, a requirement of the system should be that the confidence score must be within
range from 0 to 1. Any value outside that range would probably be a bug in the systems
implementation. Now imagine that you want to implement not only one, but a bunch of different UIMA analysis engines,
each based on a different machine learning approach and plug these into the system. Instead of
repeating the test code that checks the range of the confidence feature with each implementation, it
would be much nicer if the range check could be included with the type system that all these
implementations share. The unit tests should be able to pick this check (any any other consistency
checks) up automatically and use them.
5.2. Defining a validation check
To define a validation check, all you need to do is to create a class implementing the
org.apache.uima.fit.validation.CasValidationCheck
interface. This interfaces defines a single
method List<CasValidationResult> check(CAS cas)
. Or if you prefer working against the JCas API,
you can implement the org.apache.uima.fit.validation.JCasValidationCheck
interface.
Implementations of both interfaces (CasValidationCheck
and JCasValidationCheck
) can be applied
to CAS as well as JCas instances - so it does not matter against which interface you build your
check.
public class ConfidenceRangeCheck implements JCasValidationCheck {
@Override
public List<ValidationResult> validate(JCas aJCas) throws ValidationException {
List<ValidationResult> results = new ArrayList<>();
for (Person person : JCasUtil.select(aJCas, Person.class)) {
if (person.getConfidence() < 0.0d || person.getConfidence() > 1.0d) {
results.add(ValidationResult.error(this, "Invalid confidence score (%f) on %s at [%d,%d]",
person.getConfidence(), person.getType().getName(),
person.getBegin(), person.getEnd()));
}
}
return results;
}
}
Note
|
Checks are instantiated by the system as singletons. This means that their implementations must be stateless and must have a zero-argument constructor (or no constructor at all). |
5.3. Registering the check for auto-detection
uimaFIT uses the Java Service Locator mechanism to locate validation check implementations. So to
make a check available for auto-detection, its fully-qualified class name must be added to a file
META-INF/services/org.apache.uima.fit.validation.ValidationCheck
. Multiple checks can be added by
putting each class name on separate lines.
5.4. Validating a CAS
The org.apache.uima.fit.validation.Validator
class can be used to validate your (J)CASes. This
class is typically constructed using a builder:
CAS cas = ...
// By default, the builder auto-detects all registered checks
Validator validator = new Validator.Builder().build();
// You could also pass in a JCas here instead of a CAS
ValidationSummary summary = validator.check(cas);
The output of a check is a ValidationSummary
which contains a bunch of ValidationResult
items.
A ValidationResult
essentially is a message with a severity level. When a summary contains any
result with an error-level severity, the validation should be considered as failed.
The Validator.Builder
can be configured, e.g. to exclude certain checks or to entirely disable the
auto-detection of checks and instead work with only a set of explicitly specified checks.
6. Running Experiments
The uimafit-examples module contains a package org.apache.uima.fit.examples.experiment.pos which demonstrates a very simple experimental setup for testing a part-of-speech tagger. You may find this example more accessible if you check out the code from subversion and build it in your own environment.
The documentation for this example can be found in the code itself.
Please refer to RunExperiment
as a starting point.
The following is copied from the javadoc comments of that file:
RunExperiment
demonstrates a very common (though simplified) experimental setup in which gold standard data is available for some task and you want to evaluate how well your analysis engine works against that data. Here we are evaluatingBaselineTagger
which is a (ridiculously) simple part-of-speech tagger against the part-of-speech tags found insrc/main/resources/org/apache/uima/fit/examples/pos/sample-gold.txt
The basic strategy is as follows:
-
post the data as is into the default view,
-
parse the gold-standard tokens and part-of-speech tags and put the results into another view we will call GOLD_VIEW,
-
create another view called SYSTEM_VIEW and copy the text and
Token
annotations from the GOLD_VIEW into this view, -
run the
BaselineTagger
on the SYSTEM_VIEW over the copiedToken
annoations, -
evaluate the part-of-speech tags found in the SYSTEM_VIEW with those in the GOLD_VIEW.
7. CAS Utilities
uimaFIT facilitates working with the CAS and JCas by offering various convenient methods for accessing and navigating annotations and feature structures. Additionally, the the convenience methods for JCas access are fully type-safe and return the JCas type or a collection of the JCas type which you wanted to access.
7.1. Access methods
uimaFIT supports the following convenience methods for accessing CAS and JCas structures. All methods respect the UIMA index definitions and return annotations or feature structures in the order defined by the indexes. Unless the default UIMA index for annotations has been overwritten, annotations are returned sorted by begin (increasing) and end (decreasing).
-
select(cas, type)
- fetch all annotations of the given type from the CAS/JCas. Variants of this method also exist to fetch annotations from aFSList
orFSArray
. -
selectAll(cas)
- fetch all annotations from the CAS or fetch all feature structures from the JCas. -
selectBetween(type, annotation1, annotation2)
* - fetch all annotations between the given two annotations. -
selectCovered(type, annotation)
* - fetch all annotations covered by the given annotation. If this operation is used intensively,indexCovered(…)
should be used to pre-calculate annotation covering information. -
selectCovering(type, annotation)*
- fetch all annotations covering the given annotation. If this operation is used intensively,indexCovering(…)
should be used to pre-calculate annotation covering information. -
selectByIndex(cas, type, n)
- fetch the n-th feature structure of the given type. -
selectSingle(cas, type)
- fetch the single feature structure of the given type. An exception is thrown if there is not exactly one feature structure of the type. -
selectSingleRelative(type, annotation, n)
* - fetch a single annotation relative to the given annotation. A positiven
fetches the n-th annotation right of the specified annotation, while the a negativen
fetches to the left. -
selectPreceding(type, annotation, n)
* - fetch the n annotations preceding the given annotation. If there are less then n preceding annotations, all preceding annotations are returned. -
selectFollowing(type, annotation, n)
* - fetch the n annotations following the given annotation. If there are less then n following annotations, all following annotations are returned.
Note
|
For historical reasons, the method marked with * also exist in a version that accepts a CAS/JCas as the first argument. These may not work as expected when the annoation arguments provided to the method are from a different CAS/JCas/view. Also, for any method accepting two annotations, these should come from the same CAS/JCas/view. In future, the potentially problematic signatures may be deprecated, removed, or throw exeptions if these conditions are not met. |
Note
|
You should expect the structures returned by these methods to be backed by the CAS/JCas contents.
In particular, if you remove any feature structures from the CAS while iterating over these structures may cause failures.
For this reason, you should also not hold on to these structures longer than necessary, as is the case for UIMA |
Depending on whether one works with a CAS or JCas, the respective methods are available from the JCasUtil or CasUtil classes.
JCasUtil expect a JCas wrapper class for the type
argument, e.g. select(jcas, Token.class)
and return this type or a collection using this generic type.
Any subtypes of the specified type are returned as well.
CasUtil expects a UIMA Type
instance.
For conveniently getting these, CasUtil offers the methods getType(CAS, Class<?>)
or getType(CAS, String)
which fetch a type either by its JCas wrapper class or by its name.
Unless annotations are specifically required, e.g.
because begin/end offsets are required, the JCasUtil methods can be used to access any feature structure inheriting from TOP
, not only annotations.
The CasUtil methods generally work only on annotations.
Alternative methods ending in "FS" are provided for accessing arbitrary feature structures, e.g. selectFS
.
Examples:
// CAS version
Type tokenType = CasUtil.getType(cas, "my.Token");
for (AnnotationFS token : CasUtil.select(cas, tokenType)) {
...
}
// JCas version
for (Token token : JCasUtil.select(jcas, Token.class)) {
...
}
8. Configuration Parameters
uimaFIT defines the @ConfigurationParameter
annotation which can be used to annotate the fields of an analysis engine or collection reader.
The purpose of this annotation is twofold:
-
injection of parameters from the UIMA context into fields
-
declaration of parameter metadata (mandatory, default value, description) which can be used to generate XML descriptors
In a regular UIMA component, parameters need to be manually extracted from the UIMA context, typically requiring a type cast.
class MyAnalysisEngine extends CasAnnotator_ImplBase {
public static final String PARAM_SOURCE_DIRECTORY = "sourceDirectory";
private File sourceDirectory;
public void initialize(UimaContext context)
throws ResourceInitializationException {
sourceDirectory = new File((String) context.getConfigParameterValue(
PARAM_SOURCE_DIRECTORY));
}
}
The component has no way to declare a default value or to declare if a parameter is optional or mandatory. In addition, any documentation needs to be maintained in !JavaDoc and in the XML descriptor for the component.
With uimaFIT, all this information can be declared in the component using the @ConfigurationParameter
annotation.
Parameter | Description | Default |
---|---|---|
name |
parameter name |
name of annotated field |
description |
description of the parameter |
|
mandatory |
whether a non-null value must be specified |
true |
defaultValue |
the default value if no value is specified |
class MyAnalysisEngine
extends org.apache.uima.fit.component.CasAnnotator_ImplBase {
/**
* Directory to read the data from.
*/
public static final String PARAM_SOURCE_DIRECTORY = "sourceDirectory";
@ConfigurationParameter(name=PARAM_SOURCE_DIRECTORY, defaultValue=".")
private File sourceDirectory;
}
Note, that it is no longer necessary to implement the initialize()
method.
uimaFIT takes care of locating the parameter sourceDirectory
in the UIMA context.
It recognizes that the File
class has a String
constructor and uses that to instantiate a new File
object from the parameter.
A parameter is mandatory unless specified otherwise.
If a mandatory parameter is not specified in the context, an exception is thrown.
The defaultValue
is used when generating an UIMA component description from the class.
It should be pointed out in particular, that uimaFIT does not make use of the default value when injecting parameters into fields.
For this reason, it is possible to have a parameter that is mandatory but does have a default value.
The default value is used as a parameter value when a component description is generated via the uimaFIT factories unless a parameter is specified in the factory call.
If a component description in created manually without specifying a value for a mandatory parameter, uimaFIT will generate an exception.
Note
|
You can use the enhance goal of the uimaFIT Maven plugin to pick up the parameter description from the JavaDoc and post it to the |
The parameter injection mechanism is implemented in the ConfigurationParameterInitializer
class.
uimaFIT provides several base classes that already come with an initialize()
method using the initializer:
-
CasAnnotator_ImplBase
-
CasCollectionReader_ImplBase
-
CasConsumer_ImplBase
-
CasFlowController_ImplBase
-
CasMultiplier_ImplBase
-
JCasAnnotator_ImplBase
-
JCasCollectionReader_ImplBase
-
JCasConsumer_ImplBase
-
JCasFlowController_ImplBase
-
JCasMultiplier_ImplBase
-
Resource_ImplBase
The ConfigurationParameterInitializer
can also be used with shared resources:
class MySharedResourceObject implements SharedResourceObject {
public static final String PARAM_VALUE = "Value";
@ConfigurationParameter(name = PARAM_VALUE, mandatory = true)
private String value;
public void load(DataResource aData)
throws ResourceInitializationException {
ConfigurationParameterInitializer.initialize(this, aData);
}
}
Fields that can be annotated with the @ConfigurationParameter
annotation are any array or collection types (including if they are only typed via interfaces such as List
or Set
) of primitive types (int
, boolean
, float
, double
). Enum types, as well as, fields of the types Charset
, File
, Locale
, Pattern
, URI
, and URL
can also be used.
These can be initialized either using an object value (e.g. StandardChartsets.UTF_8`
) or a string value (e.g. "UTF-8"
). Additionally it is possible to inject any fields of types that define a constructor accepting a single String
.
These must be initialized from a string value.
Multi-valued parameters can be initialized from single values without having to wrap these into a container.
9. External Resources
An analysis engine often uses some data model. This may be as simple as word frequency counts or as complex as the model of a parser. Often these models can become quite large. If an analysis engine is deployed multiple times in the same pipeline or runs on multiple CPU cores, memory can be saved by using a shared instance of the data model. UIMA supports such a scenario by so-called external resources. The following sections illustrates how external resources can be used with uimaFIT.
First create a class for the shared data model.
Usually this class would load its data from some URI and then expose it via its methods.
An example would be to load word frequency counts and to provide a getFrequency()
method.
In our simple example we do not load anything from the provided URI - we just offer a method to get the URI from which data be loaded.
// Simple model that only stores the URI it was loaded from. Normally data
// would be loaded from the URI instead and made accessible through methods
// in this class. This simple example only allows accessing the URI.
public static final class SharedModel implements SharedResourceObject {
private String uri;
public void load(DataResource aData)
throws ResourceInitializationException {
uri = aData.getUri().toString();
}
public String getUri() { return uri; }
}
9.1. Resource injection
9.1.1. Regular UIMA components
When an external resource is used in a regular UIMA component, it is usually fetched from the context, cast and copied to a class member variable.
class MyAnalysisEngine extends CasAnnotator_ImplBase {
final static String MODEL_KEY = "Model";
private SharedModel model;
public void initialize(UimaContext context)
throws ResourceInitializationException {
configuredResource = (SharedModel)
getContext().getResourceObject(MODEL_KEY);
}
}
uimaFIT can be used to inject external resources into such traditional components using the createDependencyAndBind()
method.
To show that this works with any off-the-shelf UIMA component, the following example uses uimaFIT to configure the OpenNLP Tokenizer:
// Create descriptor
AnalysisEngineDescription tokenizer = createEngineDescription(
Tokenizer.class,
UimaUtil.TOKEN_TYPE_PARAMETER, Token.class.getName(),
UimaUtil.SENTENCE_TYPE_PARAMETER, Sentence.class.getName());
// Create the external resource dependency for the model and bind it
createDependencyAndBind(tokenizer, UimaUtil.MODEL_PARAMETER,
TokenizerModelResourceImpl.class,
"http://opennlp.sourceforge.net/models-1.5/en-token.bin");
Note
|
We recommend declaring parameter constants in the classes that use them, e.g.
here in |
Note
|
Note that uimaFIT is unable to perform type-coercion on parameters if a descriptor is created from a class that does not contain |
9.1.2. uimaFIT-aware components
uimaFIT provides the @ExternalResource
annotation to inject external resources directly into class member variables.
Parameter | Description | Default |
---|---|---|
key |
Resource key |
field name |
api |
Used when the external resource type is different from the field type, e.g. when using an ExternalResourceLocator |
field type |
mandatory |
Whether a value must be specified |
true |
// Example annotator that uses the SharedModel. In the process() we only
// test if the model was properly initialized by uimaFIT
public static class Annotator
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase {
final static String MODEL_KEY = "Model";
@ExternalResource(key = MODEL_KEY)
private SharedModel model;
public void process(JCas aJCas) throws AnalysisEngineProcessException {
assertTrue(model.getUri().endsWith("gene_model_v02.bin"));
// Prints the instance ID to the console - this proves the same
// instance of the SharedModel is used in both Annotator instances.
System.out.println(model);
}
}
Note, that it is no longer necessary to implement the initialize()
method.
uimaFIT takes care of locating the external resource Model
in the UIMA context and assigns it to the field model
.
If a mandatory resource is not present in the context, an exception is thrown.
The resource injection mechanism is implemented in the ExternalResourceInitializer
class.
uimaFIT provides several base classes that already come with an initialize()
method using the initializer:
-
CasAnnotator_ImplBase
-
CasCollectionReader_ImplBase
-
CasConsumer_ImplBase
-
CasFlowController_ImplBase
-
CasMultiplier_ImplBase
-
JCasAnnotator_ImplBase
-
JCasCollectionReader_ImplBase
-
JCasConsumer_ImplBase
-
JCasFlowController_ImplBase
-
JCasMultiplier_ImplBase
-
Resource_ImplBase
When building a pipeline, external resources can be set of a component just like configuration parameters. External resources and configuration parameters can be mixed and appear in any order when creating a component description.
Note that in the following example, we create only one external resource description and use it to configure two different analysis engines. Because we only use a single description, also only a single instance of the external resource is created and shared between the two engines.
ExternalResourceDescription extDesc = createSharedResourceDescription(
SharedModel.class, new File("somemodel.bin"));
// Binding external resource to each Annotator individually
AnalysisEngineDescription aed1 = createEngineDescription(
Annotator.class,
Annotator.MODEL_KEY, extDesc);
AnalysisEngineDescription aed2 = createEngineDescription(
Annotator.class,
Annotator.MODEL_KEY, extDesc);
// Check the external resource was injected
AnalysisEngineDescription aaed = createEngineDescription(aed1, aed2);
AnalysisEngine ae = createEngine(aaed);
ae.process(ae.newJCas());
This example is given as a full JUnit-based example in the the uimaFIT-examples project.
9.1.3. Resources extending Resource_ImplBase
One kind of resources extend Resource_ImplBase
.
These are the easiest to handle, because uimaFIT’s version of Resource_ImplBase
already implements the necessary logic.
Just be sure to call super.initialize()
when overriding initialize()
.
Also mind that external resources are not available yet when initialize()
is called.
For any initialization logic that requires resources, override and implement afterResourcesInitialized()
.
Other than that, injection of external resources works as usual.
public static class ChainableResource extends Resource_ImplBase {
public final static String PARAM_CHAINED_RESOURCE = "chainedResource";
@ExternalResource(key = PARAM_CHAINED_RESOURCE)
private ChainableResource chainedResource;
public void afterResourcesInitialized() {
// init logic that requires external resources
}
}
9.1.4. Resources implementing SharedResourceObject
The other kind of resources implement SharedResourceObject`
.
Since this is an interface, uimaFIT cannot provide the initialization logic, so you have to implement a couple of things in the resource:
-
implement
ExternalResourceAware
-
declare a configuration parameter
ExternalResourceFactory.PARAM_RESOURCE_NAME
and return its value ingetResourceName()
-
invoke
ConfigurationParameterInitializer.initialize()
in theload()
method.
Again, mind that external resource not properly initialized until uimaFIT invokes afterResourcesInitialized()
.
public class TestSharedResourceObject implements
SharedResourceObject, ExternalResourceAware {
@ConfigurationParameter(name=ExternalResourceFactory.PARAM_RESOURCE_NAME)
private String resourceName;
public final static String PARAM_CHAINED_RESOURCE = "chainedResource";
@ExternalResource(key = PARAM_CHAINED_RESOURCE)
private ChainableResource chainedResource;
public String getResourceName() {
return resourceName;
}
public void load(DataResource aData)
throws ResourceInitializationException {
ConfigurationParameterInitializer.initialize(this, aData);
// rest of the init logic that does not require external resources
}
public void afterResourcesInitialized() {
// init logic that requires external resources
}
}
9.1.5. Note on injecting resources into resources
Nested resources are only initialized if they are used in a pipeline which contains at least one component that calls ConfigurationParameterInitializer.initialize()
.
Any component extending uimaFIT’s component base classes qualifies.
If you use nested resources in a pipeline without any uimaFIT-aware components, you can just add uimaFIT’s NoopAnnotator
to the pipeline.
9.2. Resource locators
Normally, in UIMA an external resource needs to implement either SharedResourceObject
or Resource
.
In order to inject arbitrary objects, uimaFIT has the concept of ExternalResourceLocator
.
When a resource implements this interface, not the resource itself is injected, but the method getResource()
is called on the resource and the result is injected.
The following example illustrates how to inject an object from JNDI into a UIMA component:
class MyAnalysisEngine2 extends JCasAnnotator_ImplBase {
static final String RES_DICTIONARY = "dictionary";
@ExternalResource(key = RES_DICTIONARY)
Dictionary dictionary;
}
AnalysisEngineDescription desc = createEngineDescription(
MyAnalysisEngine2.class);
bindResource(desc, MyAnalysisEngine2.RES_DICTIONARY,
JndiResourceLocator.class,
JndiResourceLocator.PARAM_NAME, "dictionaries/german");
10. Type System Detection
UIMA requires that types that are used in the CAS are defined in XML files - so-called type system descriptions (TSD). Whenever a UIMA component is created, it must be associated with such a type system. While it is possible to manually load the type system descriptors and pass them to each UIMA component and to each created CAS, it is quite inconvenient to do so. For this reason, uimaFIT supports the automatic detection of such files in the classpath. Thus is becomes possible for a UIMA component provider to have component’s type automatically detected and thus the components becomes immediately usable by adding it to the classpath.
10.1. Making types auto-detectable
10.1.1. Using the Java Service Provide Interface
The Java Service Provide Interface (SPI) mechanism is a standard approach in Java for building extensible software. In our case, we want to make uimaFIT aware of type system descriptions, index definitions or type priority lists so that when we create a new CAS or analysis component, they are automatically pre-configured with these.
To enable this auto-detection, the UIMA Core Java SDK provides defines interfaces:
-
org.apache.uima.spi.FsIndexCollectionProvider
-
org.apache.uima.spi.TypePrioritiesProvider
-
org.apache.uima.spi.TypeSystemDescriptionProvider
Java code that wants to announce types, indexes or type priorities must implement one or more of these interfaces in a provider class. We will make an example for type system descriptions. It works in the same way for indexes and type priorities.
The following provider class publishes types from a type system description XML file located it can
access via the classpath at /org/apache/uima/examples/types/TypeSystem.xml
:
import static java.util.Arrays.asList;
import static org.apache.uima.fit.factory.TypeSystemDescriptionFactory.createTypeSystemDescription;
import java.util.List;
import org.apache.uima.resource.metadata.TypeSystemDescription;
import org.apache.uima.spi.TypeSystemDescriptionProvider;
public class MyTypeSystemProvider implements TypeSystemDescriptionProvider {
@Override
public List<TypeSystemDescription> listTypeSystemDescriptions() {
return asList(createTypeSystemDescription("org.apache.uima.examples.types.TypeSystem"));
}
}
You may also consider a slightly more advanced implementation that pre-resolves any imports that may be contained in the loaded descriptors. This is important if you are in an environment with multiple classloaders such as an OSGI environment.
import static java.util.Arrays.asList;
import static org.apache.uima.fit.factory.TypeSystemDescriptionFactory.createTypeSystemDescription;
import java.util.Collections;
import java.util.List;
import org.apache.uima.UIMAFramework;
import org.apache.uima.resource.ResourceManager;
import org.apache.uima.resource.impl.ResourceManager_impl;
import org.apache.uima.resource.metadata.TypeSystemDescription;
import org.apache.uima.spi.TypeSystemDescriptionProvider;
import org.apache.uima.util.InvalidXMLException;
public class MyAdvancedTypeSystemProvider implements TypeSystemDescriptionProvider {
@Override
public List<TypeSystemDescription> listTypeSystemDescriptions() {
ResourceManager resMgr = new ResourceManager_impl(getClass().getClassLoader());
try {
TypeSystemDescription tsd = createTypeSystemDescription(
"org.apache.uima.examples.types.TypeSystem");
tsd.resolveImports(resMgr);
return asList(tsd);
} catch (InvalidXMLException e) {
UIMAFramework.getLogger().error("Unable to load type system", e);
return Collections.emptyList();
} finally {
resMgr.destroy();
}
}
}
In a proper implementation, you might care to use a better error handling, use your own loggers instead of the framework logger, maybe load additional type system descriptions, etc.
Once the provider class has been implemented, it needs to be registered with the SPI mechanism.
To do that, create a text file with the name of the implemented interface in META-INF/services
, e.g.
META-INF/services/org.apache.uima.spi.TypeSystemDescriptionProvider
. Into that file, add the name of
the provider class implementation, e.g. foo.bar.MyTypeSystemProvider
. If you have multiple provider
classes for the given interface, add them all, one class per line.
10.1.2. Legacy approach
The provider of a type system should create a file META-INF/org.apache.uima.fit/types.txt in the classpath.
This file should define the locations of the type system descriptions.
Assume that a type org.apache.uima.fit.type.Token
is specified in the TSD org/apache/uima/fit/type/Token.xml, then the file should have the following contents:
classpath*:org/apache/uima/fit/type/Token.xml
Note
|
Mind that the file types.txt is must be located in META-INF/org.apache.uima.fit where org.apache.uima.fit is the name of a sub-directory inside META-INF. We are not using the Java package notation here! |
To specify multiple TSDs, add additional lines to the file. If you have a large number of TSDs, you may prefer to add a pattern. Assume that we have a large number of TSDs under org/apache/uima/fit/type, we can use the following pattern which recursively scans the package org.apache.uima.fit.type and all sub-packages for XML files and tries to load them as TSDs.
classpath*:org/apache/uima/fit/type/**/*.xml
Try to design your packages structure in a way that TSDs and JCas wrapper classes generated from them are separate from the rest of your code.
If it is not possible or inconvenient to add the types.txt
file, patterns can also be specified using the system property org.apache.uima.fit.type.import_pattern
.
Multiple patterns may be specified separated by semicolon:
-Dorg.apache.uima.fit.type.import_pattern=\
classpath*:org/apache/uima/fit/type/**/*.xml
Note
|
The |
10.2. Making index definitions and type priorities auto-detectable
Auto-detection also works for index definitions and type priority definitions. For index definitions, the respective file where to register the index definition XML files is META-INF/org.apache.uima.fit/fsindexes.txt and for type priorities, it is META-INF/org.apache.uima.fit/typepriorities.txt.
10.3. Using type auto-detection
The auto-detected type system can be obtained from the TypeSystemDescriptionFactory
:
TypeSystemDescription tsd =
TypeSystemDescriptionFactory.createTypeSystemDescription()
Popular factory methods also support auto-detection:
AnalysisEngine ae = createEngine(MyEngine.class);
10.4. Multiple META-INF/org.apache.uima.fit/types.txt files
uimaFIT supports multiple types.txt files in the classpath (e.g. in differnt JARs). The types.txt files are located via Spring using the classpath search pattern:
TYPE_MANIFEST_PATTERN = "classpath*:META-INF/org.apache.uima.fit/types.txt"
This resolves to a list URLs pointing to ALL types.txt files. The resolved URLs are unique and will point either to a specific point in the file system or into a specific JAR. These URLs can be handled by the standard Java URL loading mechanism. Example:
jar:/path/to/syntax-types.jar!/META-INF/org.apache.uima.fit/types.txt
jar:/path/to/token-types.jar!/META-INF/org.apache.uima.fit/types.txt
uimaFIT then reads all patters from all of these URLs and uses these to search the classpath again.
The patterns now resolve to a list of URLs pointing to the individual type system XML descriptors.
All of these URLs are collected in a set to avoid duplicate loading (for performance optimization - not strictly necessary because the UIMA type system merger can handle compatible duplicates). Then the descriptors are loaded into memory and merged using the standard UIMA type system merger (CasCreationUtils.mergeTypeSystems()
). Example:
jar:/path/to/syntax-types.jar!/desc/types/Syntax.xml
jar:/path/to/token-types.jar!/org/foobar/typesystems/Tokens.xml
Voilá, the result is a type system covering all types could be found in the classpath.
It is recommended
-
to put type system descriptors into packages resembling a namespace you "own" and to use a package-scoped wildcard search
classpath*:org/apache/uima/fit/type/**/*.xml`
-
or when putting descriptors into a "well-known" package like desc.type, that types.txt file should explicitly list all type system descriptors instead of using a wildcard search
classpath*:desc/type/Token.xml classpath*:desc/type/Syntax.xml
Method 1 should be preferred. Both methods can be mixed.
10.5. Performance note and caching
Currently uimaFIT evaluates the patterns for TSDs once and caches the locations, but not the actual merged type system description.
A rescan can be forced using TypeSystemDescriptionFactory.forceTypeDescriptorsScan()
.
This may change in future.
10.6. Potential problems
The mechanism works fine. However, there are specific issues with Java in general that one should be aware of.
10.6.1. m2eclipse fails to copy descriptors to target/classes
There seems to be a bug in some older versions of m2eclipse that causes resources not always to be copied to target/classes. If UIMA complains about type definitions missing at runtime, try to clean/rebuild your project and carefully check the m2eclipse console in the console view for error messages that might cause m2eclipse to abort.
10.6.2. Class version conflicts
A problem can occur if you end up having multiple incompatible versions of the same type system in the classpath.
This is a general problem and not related to the auto-detection feature.
It is the same as when you have incompatible version of a particular class (e.g. JCas
wrapper or some third-party-library) in the classpath.
The behavior of the Java Classloader is undefined in that case.
The detection will do its best to try and load everything it can find, but the UIMA type system merger may barf or you may end up with undefined behavior at runtime because one of the class versions is used at random.
10.6.3. Classes and resources in the default package
It is bad practice to place classes into the default (unnamed) package. In fact it is not possible to import classes from the default package in another class. Similarly it is a bad idea to put resources at the root of the classpath. The Spring documentation on resources explains this in detail.
For this reason the types.txt resides in /META-INF/org.apache.uima.fit and it is suggest that type system descriptors reside either in a proper package like /org/foobar/typesystems/XXX.xml or in /desc/types/XXX.xml.
11. Building an executable JAR
Building an executable JAR including uimaFIT components typically requires extra care. Per convention, uimaFIT expects certain information in specific locations on the classpath, e.g. the types.txt file that controls the automatic type system detection mechanism must reside at META-INF/org.apache.uima.fit/types.txt. It often occurs that a project has several dependencies, each supplying its own configuration files at these standard locations. However, this causes a problem with naive approaches to creating an executable fat-jar merging all dependencies into a single JAR file. Without extra care, the files supplied by the different dependencies overwrite each other during the packaging process and only one file wins in the end. As a consequence, the types configured in the other files cannot be detected at runtime. Such a native approach is taken, for example, by the Maven Assembly Plugin.
The Maven Shade Plugin provides a convenient alternative for the creation of executable fat-jars, as it provides a mechanism to concatenate the configuration files from different dependencies while creating the fat-jar.
To use the Maven Shade Plugin with uimaFIT, use the following configuration section in your POM file and make sure to change the mainClass
as required for your project:
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals><goal>shade</goal></goals>
<configuration>
<transformers>
<!-- Set the main class of the executable JAR -->
<transformer
implementation="org.apache.maven.plugins.shade.\
resource.ManifestResourceTransformer">
<mainClass>org.apache.uima.fit.example.Main</mainClass>
</transformer>
<!-- Merge the uimaFIT configuration files -->
<transformer
implementation="org.apache.maven.plugins.shade.\
resource.AppendingTransformer">
<resource>\
META-INF/org.apache.uima.fit/fsindexes.txt\
</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.\
resource.AppendingTransformer">
<resource>\
META-INF/org.apache.uima.fit/types.txt\
</resource>
</transformer>
<transformer
implementation="org.apache.maven.plugins.shade.\
resource.AppendingTransformer">
<resource>\
META-INF/org.apache.uima.fit/typepriorities.txt\
</resource>
</transformer>
<!-- Merge CAS validation check registrations -->
<transformer
implementation="org.apache.maven.plugins.shade.\
resource.ServicesResourceTransformer"/>
</transformers>
<!--
Prevent huge shaded artifacts from being deployed
to a Maven repository (remove if not desired)
-->
<outputFile>\
${project.build.directory}/\
${artifactId}-${version}-standalone.jar\
</outputFile>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
Note
|
Due to formatting constraints in the PDF version of this manual, the example above uses |
Note
|
You might want to consider also merging additional files, such as LICENSE, NOTICE, or DEPENDENCY files, configuration files for the Java Service Locator API, or files used by other frameworks that uses similar conventions for configuration file locations. Check the documentation of the Maven Shade Plugin, as different kinds of configuration files require different specialized transformers. |
12. uimaFIT Maven Plugin
uimaFIT dynamically generates UIMA component descriptions from annotations in the Java source code. The uimaFIT Maven plugin provides the ability to automatically create such annotations in already compiled classes and to automatically generate XML descriptors from the annotated classes.
12.1. enhance goal
The goal enhance allows automatically augmenting compiled classes with uimaFIT annotations. Information like vendor, copyright, or version can be obtained from the Maven POM. Additionally, descriptions for parameters and components can be generated from Javadoc comments. Existing annotations are not overwritten unless forced.
<plugin>
<groupId>org.apache.uima</groupId>
<artifactId>uimafit-maven-plugin</artifactId>
<version></version> <!-- change to latest version -->
<configuration>
<!-- OPTIONAL -->
<!-- Override component description in generated descriptors. -->
<overrideComponentDescription>false</overrideComponentDescription>
<!-- OPTIONAL -->
<!-- Override version in generated descriptors. -->
<overrideComponentVersion>false</overrideComponentVersion>
<!-- OPTIONAL -->
<!-- Override vendor in generated descriptors. -->
<overrideComponentVendor>false</overrideComponentVendor>
<!-- OPTIONAL -->
<!-- Override copyright in generated descriptors. -->
<overrideComponentCopyright>false</overrideComponentCopyright>
<!-- OPTIONAL -->
<!-- Version to use in generated descriptors. -->
<componentVersion>${project.version}</componentVersion>
<!-- OPTIONAL -->
<!-- Vendor to use in generated descriptors. -->
<componentVendor>Apache Foundation</componentVendor>
<!-- OPTIONAL -->
<!-- Copyright to use in generated descriptors. -->
<componentCopyright>Apache Foundation 2013</componentCopyright>
<!-- OPTIONAL -->
<!-- Source file encoding. -->
<encoding>${project.build.sourceEncoding}</encoding>
<!-- OPTIONAL -->
<!-- Generate a report of missing meta data in
$project.build.directory/uimafit-missing-meta-data-report.txt -->
<generateMissingMetaDataReport>true</generateMissingMetaDataReport>
<!-- OPTIONAL -->
<!-- Fail on missing meta data. This setting has no effect unless
generateMissingMetaDataReport is enabled. -->
<failOnMissingMetaData>false</failOnMissingMetaData>
<!-- OPTIONAL -->
<!-- Constant name prefixes used for parameters and external resources,
e.g. "PARAM_". -->
<parameterNameConstantPrefixes>
<prefix>PARAM_<prefix/>
</parameterNameConstantPrefixes>
<!-- OPTIONAL -->
<!-- Fail on missing meta data. This setting has no effect unless
generateMissingMetaDataReport is enabled. -->
<externalResourceNameConstantPrefixes>
<prefix>KEY_<prefix/>
<prefix>RES_<prefix/>
</externalResourceNameConstantPrefixes>
<!-- OPTIONAL -->
<!-- Mode of adding type systems found on the classpath via the
uimaFIT detection mechanism at compile time to the generated
descriptor. By default, no type systems are added. -->
<addTypeSystemDescriptions>NONE</addTypeSystemDescriptions>
</configuration>
<executions>
<execution>
<id>default</id>
<phase>process-classes</phase>
<goals>
<goal>enhance</goal>
</goals>
</execution>
</executions>
</plugin>
When generating descriptions for configuration parameters or external resources, the plugin supports a common practice of placing the Javadoc on a constant field instead of the parameter or external resource field.
Per default, parameter name constants must be prefixed with PARAM_
and external resource key constants must be prefixed with RES_ ` or `KEY_
.
/**
* Enable or disable my feature.
*/
public static final String PARAM_ENABLE_FEATURE = "enableFeature";
@ConfigurationParameter(name=PARAM_ENABLE_FEATURE)
private boolean enableFeature;
/**
* My external resource.
*/
public static final String RES_MY_RESOURCE = "resource";
@ExternalResource(key=RES_MY_RESOURCE)
private MyResource resource;
By enabling generateMissingMetaDataReport
, the build can be made to fail if meta data such as parameter descriptions are missing.
A report about the missing data is generated in uimafit-missing-meta-data-report.txt in the project build directory.
12.2. generate goal
The generate goal generates XML component descriptors for UIMA components.
<plugin>
<groupId>org.apache.uima</groupId>
<artifactId>uimafit-maven-plugin</artifactId>
<version></version> <!-- change to latest version -->
<configuration>
<!-- OPTIONAL -->
<!-- Path where the generated resources are written. -->
<outputDirectory>
${project.build.directory}/generated-sources/uimafit
</outputDirectory>
<!-- OPTIONAL -->
<!-- Skip generation of META-INF/org.apache.uima.fit/components.txt -->
<skipComponentsManifest>false</skipComponentsManifest>
<!-- OPTIONAL -->
<!-- Source file encoding. -->
<encoding>${project.build.sourceEncoding}</encoding>
</configuration>
<executions>
<execution>
<id>default</id>
<phase>process-classes</phase>
<goals>
<goal>generate</goal>
</goals>
</execution>
</executions>
</plugin>
In addition to the XML descriptors, a manifest file is written to META-INF/org.apache.uima.fit/components.txt
.
This file can be used to conveniently locate the XML descriptors, which are written in the packages next to the classes they describe.
classpath*:org/apache/uima/fit/examples/ExampleComponent.xml
It is recommended to use both, the enhance and the generate goal. Both goals should be specified in the same execution, first enhance, then generate:
<execution>
<id>default</id>
<phase>process-classes</phase>
<goals>
<goal>enhance</goal>
<goal>generate</goal>
</goals>
</execution>
13. Migration Guide
This section provides helpful information on incompatible changes between versions.
13.1. Version 3.0.x to 3.1.x
The renaming of methods in the ExternalResourceFactory
had unfortunately introduced another name
clash between unrelated methods. To fix this clash, the following methods have been renamed from
bindResource
to bindResourceOnce
:
-
void bindResource(ResourceCreationSpecifier aDesc, String aBindTo, ExternalResourceDescription aRes)
was removed and replaced byvoid bindResourceOnce(ResourceCreationSpecifier aDesc, String aBindTo, ExternalResourceDescription aRes)
-
void bindResource(ExternalResourceDescription aRes, String aBindTo, ExternalResourceDescription aNestedRes)
was deprecated and replaced byvoid bindResourceOnce(ExternalResourceDescription aRes, String aBindTo, ExternalResourceDescription aNestedRes)
-
void bindResource(ResourceManagerConfiguration aResMgrCfg, String aBindTo, ExternalResourceDescription aRes)
was deprecated and replaced byvoid bindResourceOnce(ResourceManagerConfiguration aResMgrCfg, String aBindTo, ExternalResourceDescription aRes)
-
void bindResource(ResourceCreationSpecifier aDesc, String aBindTo, String aRes)
was removed and replaced byvoid bindResourceOnceWithoutNested(ResourceCreationSpecifier aDesc, String aBindTo, String aRes)
-
void bindResource(ResourceManagerConfiguration aResMgrCfg, String aBindTo, String aRes)
was deprecated and replaced byvoid bindResourceOnceWithoutNested(ResourceManagerConfiguration aResMgrCfg, String aBindTo, String aRes)
-
void bindResource(ResourceSpecifier aDesc, String aKey, String aUrl)
was deprecated and replaced byvoid bindResourceUsingUrl(ResourceSpecifier aDesc, String aKey, String aUrl)
13.2. Version 2.x to 3.x
The legacy support in uimaFIT 2.x was present allow being compatible with the pre-Apache uimaFIT versions which were based on UIMA 2.x. Since uimaFIT 3.x is not compatible with UIMA 2.x anyway, the legacy module was removed now.
The CasUtil
, JCasUtil
and FSCollectionFactory
classes were adjusted to return results using List
instead of the more general Collection
. Often, lists are already used internally and then again
wrapped into new lists in client code. This API change avoids this in the future.
Several uimaFIT methods were throwing the generic UIMAException
. These have been adjusted to
declare throwing several of the sub-types of UIMAException
to be better able to handle specific
causes of errors in client code.
Signature of CasUtil.selectSingle
has been changed to return AnnotationFS
. The original signature
is available as selectSingleFS
Various methods that were deprecated in uimaFIT 2.4.0 or earlier have been removed in this release.
For details, please refer to the api-change-report.html
file included in the release.
Most methods in the ExternalResourceFactory
have seen changes to their names and signature to avoid
problematic ambiguities as well as to be shorter. In general, the External
component of the method
names was either removed or replaced. So most methods called createExternalResourceDescription
are now called createResourceDescription
. However, some have also been given a more specific name
and/or a slightly different order of parameters. For example, this method
public static ExternalResourceDescription createExternalResourceDescription(
Class<? extends SharedResourceObject> aInterface, String aUrl, Object... aParams)
was changed to
public static ExternalResourceDescription createSharedResourceDescription(
String aUrl, Class< extends SharedResourceObject> aInterface, Object... aParams)
UIMA v3 has is using SLF4J. As a consequence, the ExtendedLogger
which uimaFIT had returned on
calls to getLogger()
has been removed and instead the regular UIMA v3 logger class is returned
which offers methods quite compatible with what ExtendedLogger
offered before. However, it is
recommended that you go through all your logging calls and replace calls which use string
concatenation to construct the logging message with corresponding calls using placeholders. For
example, replace getLogger().error("Cannot access " + filename, exception);
with
getLogger().error("Cannot access {}", filename, exception);
.
Depends on UIMA 3.0.2, Spring Framework 4.3.22 and Java 8.
13.3. Version 2.3.0 to 2.4.0
Depends on UIMA 2.10.2, Spring Framework 3.2.16 and Java 7.
Mind the updated version requirements. There should be no other potentially problematic changes in this upgrade.
13.4. Version 2.2.0 to 2.3.0
The functionality of the uimaFIT CasIOUtil class has been superseded by the core UIMA class CasIOUtils added in UIMA 2.9.0. The method signatures in the new class are not the same, but provide more functionality. CasIOUtil has been deprecated and documentation has been added which of the CasIOUtils methods should be used instead.
Depends on UIMA 2.9.1, Spring Framework 3.2.16 and Java 7.
Mind the updated version requirements. There should be no other potentially problematic changes in this upgrade.
13.5. Version 2.1.0 to 2.2.0
Depends on UIMA 2.8.1, Spring Framework 3.2.16 and Java 7.
Mind the updated version requirements. There should be no other potentially problematic changes in this upgrade.
13.6. Version 2.0.0 to 2.1.0
Depends on UIMA 2.6.0 and Java 6.
No longer throws UIMAExcption
.
If this exception was cought, some IDEs may complain here after upgrading to uimaFIT 2.1.0.
13.7. Version 1.4.0 to 2.0.0
Depends on UIMA 2.4.2.
Compatibility with legacy annotation is provided by the Legacy support module.
The Maven group ID has changed from org.uimafit
to org.apache.uima
.
The artifact ID of the main uimaFIT artifact has been changed from uimafit
to uimafit-core
.
The base package has been renamed from org.uimafit
to org.apache.uima.fit
.
A global search/replace on Java files with for lines starting with import org.uimafit
and replacing that with import org.apache.uima.fit
should work.
The default value for the mandatory attribute now is true
.
The default name of configuration parameters is now the name of the annotated field only.
The classname is no longer prefixed.
The method ConfigurationParameterFactory.createConfigurationParameterName()
that was used to generate the prefixed name has been removed.
The META-INF/org.uimafit
was renamed to META-INF/org.apache.uima.fit
.
The deprecated JCasUtil.iterate()
methods have been removed. JCasUtil.select()
should be used instead.
All createAggregateXXX
and createPrimitiveXXX
methods have been renamed to createEngineXXX
.
The old names are deprecated and will be removed in future versions.
All createAnalysisEngineXXX
methods have been renamed to createEngineXXX
.
The old names are deprecated and will be removed in future versions.
All createDescriptionXXX
methods have been renamed to createReaderDescriptionXXX
.
The old names are deprecated and will be removed in future versions.
All createCollectionReaderXXX
methods have been renamed to createReaderXXX
.
The old names are deprecated and will be removed in future versions.
JCasIterable
now only accepts reader and engine descriptions (no instances) and no longer implements the Iterator
interface.
Instead, new JCasIterator
has been added, which replaces JCasIterable
in that respect.
org.uimafit.component.xwriter.CASDumpWriter
has been renamed to org.apache.uima.fit.component.CasDumpWriter
.
CpePipeline
has been moved to a separate module with the artifact ID uimafit-cpe
to reduce the dependencies incurred by the main uimaFIT artifact.
The XWriter
and associated file namers have been removed as they were much more complex then acutally needed.
As an alternative, CasIOUtil
has been introduced providing several convenience methods to read/write JCas/CAS data.
Methods only loading JCas data have been removed from JCasFactory
.
The new methods in CasIOUtil
can be used instead.