Schema Elements

Solr stores details about the field types and fields it is expected to understand in a schema file.

Solr’s Schema File

The name and location of Solr’s schema file may vary depending on how you initially configured Solr or if you modified it later.

  • managed-schema.xml is the name for the schema file Solr uses by default to support making schema changes at runtime via the Schema API, or Schemaless Mode features.

    You may explicitly configure the managed schema features to use an alternative filename if you choose, but the contents of the files are still updated automatically by Solr.

  • schema.xml is the traditional name for a schema file which can be edited manually by users who use the ClassicIndexSchemaFactory.

  • If you are using SolrCloud you may not be able to find any file by these names on the local filesystem. You will only be able to see the schema through the Schema API (if enabled) or through the Solr Admin UI’s Cloud Screens.

Whichever name of the file in use in your installation, the structure of the file is not changed. However, the way you interact with the file will change. If you are using the managed schema, it is expected that you only interact with the file with the Schema API, and never make manual edits. If you do not use the managed schema, you will only be able to make manual edits to the file, the Schema API will not support any modifications.

Note that if you are not using the Schema API yet you do use SolrCloud, you will need to interact with the schema file through ZooKeeper using upconfig and downconfig commands to make a local copy and upload your changes. The options for doing this are described in Solr Control Script Reference and ZooKeeper File Management.

Structure of the Schema File

This example is not real XML, but shows the primary elements that make up a schema file.

<schema>
  <types>
    <fieldType>
  <fields>
    <field>
  <copyField>
  <dynamicField>
  <similarity>
  <uniqueKey>
</schema>

The most commonly defined elements are types and fields, where the field types and the actual fields are configured. The sections Field Type Definitions and Properties, and Fields describe how to configure these for your schema.

These are supplemented by copyFields, described in Copy Fields, and dynamicFields, described in Dynamic Fields.

The uniqueKey described in Unique Key below must always be defined.

A default similarity will be used, but can be modified as described in the section Similarity below.

Types and fields are optional tags

Note that the types and fields sections are optional, meaning you are free to mix field, dynamicField, copyField and fieldType definitions on the top level. This allows for a more logical grouping of related elements (such as fields grouped with their field type definition) in your schema.

Unique Key

The uniqueKey element specifies which field is a unique identifier for documents. Although uniqueKey is not required, it is nearly always warranted by your application design. For example, uniqueKey should be used if you will ever update a document in the index.

You can define the unique key field by naming it:

<uniqueKey>id</uniqueKey>

Schema defaults and copyFields cannot be used to populate the uniqueKey field. The fieldType of uniqueKey must not be analyzed and must not be any of the *PointField types. You can use UUIDUpdateProcessorFactory to have uniqueKey values generated automatically.

Further, the operation will fail if the uniqueKey field is used, but is multivalued (or inherits the multivalued-ness from the fieldtype). However, uniqueKey will continue to work, as long as the field is properly used.

Similarity

Similarity is a Lucene class used to score a document in searching.

Each collection has one "global" Similarity. By default, Solr uses an implicit SchemaSimilarityFactory which allows individual field types to be configured with a "per-type" specific Similarity and implicitly uses BM25Similarity for any field type which does not have an explicit Similarity.

This default behavior can be overridden by declaring a top level <similarity/> element in your schema, outside of any single field type. This similarity declaration can either refer directly to the name of a class with a no-argument constructor, such as in this example showing BM25Similarity:

<similarity class="org.apache.lucene.search.similarities.BM25Similarity"/>

or by referencing a SimilarityFactory implementation:

<similarity class="solr.BM25SimilarityFactory"/>

When using the similarity factory, it is possible to specify optional initialization parameters:

<similarity class="solr.DFRSimilarityFactory">
  <str name="basicModel">P</str>
  <str name="afterEffect">L</str>
  <str name="normalization">H2</str>
  <float name="c">7</float>
</similarity>

In most cases, specifying global level similarity like this will cause an error if your schema also includes field type specific <similarity/> declarations. One key exception to this is that you may explicitly declare a SchemaSimilarityFactory and specify what that default behavior will be for all field types that do not declare an explicit Similarity using the name of field type (specified by defaultSimFromFieldType) that is configured with a specific similarity:

<similarity class="solr.SchemaSimilarityFactory">
  <str name="defaultSimFromFieldType">text_dfr</str>
</similarity>

<fieldType name="text_dfr" class="solr.TextField">
  <analyzer ... />
  <similarity class="solr.DFRSimilarityFactory">
    <str name="basicModel">I(F)</str>
    <str name="afterEffect">B</str>
    <str name="normalization">H3</str>
    <float name="mu">900</float>
  </similarity>
</fieldType>

<fieldType name="text_ib" class="solr.TextField">
  <analyzer ... />
  <similarity class="solr.IBSimilarityFactory">
    <str name="distribution">SPL</str>
    <str name="lambda">DF</str>
    <str name="normalization">H2</str>
  </similarity>
</fieldType>

<fieldType name="text_other" class="solr.TextField">
  <analyzer ... />
</fieldType>

In the example above IBSimilarityFactory (using the Information-Based model) will be used for any fields of type text_ib, while DFRSimilarityFactory (divergence from random) will be used for any fields of type text_dfr, as well as any fields using a type that does not explicitly specify a <similarity/>.

If SchemaSimilarityFactory is explicitly declared without configuring a defaultSimFromFieldType, then BM25Similarity is implicitly used as the default.

In addition to the various factories mentioned on this page, there are several other similarity implementations that can be used such as the SweetSpotSimilarityFactory, ClassicSimilarityFactory etc. For details, see the Solr Javadocs for the similarity factories.