Schema Elements
Solr stores details about the field types and fields it is expected to understand in a schema file.
Solr’s Schema File
The name and location of Solr’s schema file may vary depending on how you initially configured Solr or if you modified it later.
-
managed-schema.xml
is the name for the schema file Solr uses by default to support making schema changes at runtime via the Schema API, or Schemaless Mode features.You may explicitly configure the managed schema features to use an alternative filename if you choose, but the contents of the files are still updated automatically by Solr.
-
schema.xml
is the traditional name for a schema file which can be edited manually by users who use theClassicIndexSchemaFactory
. -
If you are using SolrCloud you may not be able to find any file by these names on the local filesystem. You will only be able to see the schema through the Schema API (if enabled) or through the Solr Admin UI’s Cloud Screens.
Whichever name of the file in use in your installation, the structure of the file is not changed. However, the way you interact with the file will change. If you are using the managed schema, it is expected that you only interact with the file with the Schema API, and never make manual edits. If you do not use the managed schema, you will only be able to make manual edits to the file, the Schema API will not support any modifications.
Note that if you are not using the Schema API yet you do use SolrCloud, you will need to interact with the schema file through ZooKeeper using upconfig
and downconfig
commands to make a local copy and upload your changes.
The options for doing this are described in Solr Control Script Reference and ZooKeeper File Management.
Structure of the Schema File
This example is not real XML, but shows the primary elements that make up a schema file.
<schema>
<types>
<fieldType>
<fields>
<field>
<copyField>
<dynamicField>
<similarity>
<uniqueKey>
</schema>
The most commonly defined elements are types
and fields
, where the field types and the actual fields are configured.
The sections Field Type Definitions and Properties, and Fields describe how to configure these for your schema.
These are supplemented by copyFields
, described in Copy Fields, and dynamicFields
, described in Dynamic Fields.
The uniqueKey
described in Unique Key below must always be defined.
A default similarity
will be used, but can be modified as described in the section Similarity below.
Types and fields are optional tags
Note that the |
Unique Key
The uniqueKey
element specifies which field is a unique identifier for documents.
Although uniqueKey
is not required, it is nearly always warranted by your application design.
For example, uniqueKey
should be used if you will ever update a document in the index.
You can define the unique key field by naming it:
<uniqueKey>id</uniqueKey>
Schema defaults and copyFields
cannot be used to populate the uniqueKey
field.
The fieldType
of uniqueKey
must not be analyzed and must not be any of the *PointField
types.
You can use UUIDUpdateProcessorFactory
to have uniqueKey
values generated automatically.
Further, the operation will fail if the uniqueKey
field is used, but is multivalued (or inherits the multivalued-ness from the fieldtype
).
However, uniqueKey
will continue to work, as long as the field is properly used.
Similarity
Similarity is a Lucene class used to score a document in searching.
Each collection has one "global" Similarity.
By default, Solr uses an implicit SchemaSimilarityFactory
which allows individual field types to be configured with a "per-type" specific Similarity and implicitly uses BM25Similarity
for any field type which does not have an explicit Similarity.
This default behavior can be overridden by declaring a top level <similarity/>
element in your schema, outside of any single field type.
This similarity declaration can either refer directly to the name of a class with a no-argument constructor, such as in this example showing BM25Similarity
:
<similarity class="org.apache.lucene.search.similarities.BM25Similarity"/>
or by referencing a SimilarityFactory
implementation:
<similarity class="solr.BM25SimilarityFactory"/>
When using the similarity factory, it is possible to specify optional initialization parameters:
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">P</str>
<str name="afterEffect">L</str>
<str name="normalization">H2</str>
<float name="c">7</float>
</similarity>
In most cases, specifying global level similarity like this will cause an error if your schema also includes field type specific <similarity/>
declarations.
One key exception to this is that you may explicitly declare a SchemaSimilarityFactory
and specify what that default behavior will be for all field types that do not declare an explicit Similarity using the name of field type (specified by defaultSimFromFieldType
) that is configured with a specific similarity:
<similarity class="solr.SchemaSimilarityFactory">
<str name="defaultSimFromFieldType">text_dfr</str>
</similarity>
<fieldType name="text_dfr" class="solr.TextField">
<analyzer ... />
<similarity class="solr.DFRSimilarityFactory">
<str name="basicModel">I(F)</str>
<str name="afterEffect">B</str>
<str name="normalization">H3</str>
<float name="mu">900</float>
</similarity>
</fieldType>
<fieldType name="text_ib" class="solr.TextField">
<analyzer ... />
<similarity class="solr.IBSimilarityFactory">
<str name="distribution">SPL</str>
<str name="lambda">DF</str>
<str name="normalization">H2</str>
</similarity>
</fieldType>
<fieldType name="text_other" class="solr.TextField">
<analyzer ... />
</fieldType>
In the example above IBSimilarityFactory
(using the Information-Based model) will be used for any fields of type text_ib
, while DFRSimilarityFactory
(divergence from random) will be used for any fields of type text_dfr
, as well as any fields using a type that does not explicitly specify a <similarity/>
.
If SchemaSimilarityFactory
is explicitly declared without configuring a defaultSimFromFieldType
, then BM25Similarity
is implicitly used as the default.
In addition to the various factories mentioned on this page, there are several other similarity implementations that can be used such as the SweetSpotSimilarityFactory
, ClassicSimilarityFactory
etc.
For details, see the Solr Javadocs for the similarity factories.