Stream Tool
The Stream tool allows you to run a Streaming Expressions in Solr and see the results from the command line.
It is very similar to the Stream Screen, but is part of the bin/solr
CLI.
Being a CLI, you can pipe content into it similar to other Unix style tools, as well as run actually RUN many kinds of expressions locally as well.
The Stream Tool is classified as "experimental". It may change in backwards-incompatible ways as it evolves to cover additional functionality. |
To run it, open a terminal and enter:
$ bin/solr stream --header -c techproducts --delimiter=\| 'search(techproducts,q="name:memory",fl="name,price")'
This will run the provided streaming expression on the techproducts
collection on your local Solr and produce:
name|price
CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail|185.0
CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail|74.99
A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM|
Notice how we used the pipe character (|) as the delimiter? It required a backslash for escaping it so it wouldn’t be treated as a pipe within the shell script. |
You can also specify a file with the suffix .expr
containing your streaming expression.
This is useful for longer expressions or if you are experiencing shell character-escaping issues with your expression.
Assuming you have create the file stream.expr
with the contents:
# Stream a search search( techproducts, q="name:memory", fl="name,price", sort="price desc" )
Then you can run it on the Solr collection techproducts
, specifying you want a header row:
$ bin/solr stream --header -c techproducts stream.expr
And this will produce:
name price
CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail 185.0
CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail 74.99
A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM
Using the bin/solr stream Tool
To use the tool you need to provide the streaming expression either inline as the last argument, or provide a file ending in .expr
that contains the expression.
The --help
(or simply -h
) option will output information on its usage (i.e., bin/solr stream --help)
:
usage: bin/solr stream [--array-delimiter <CHARACTER>] [-c <NAME>] [--delimiter <CHARACTER>] [-e <ENVIRONMENT>] [-f
<FIELDS>] [-h] [--header] [-s <HOST>] [-u <credentials>] [-v] [-z <HOST>]
List of options:
--array-delimiter <CHARACTER> The delimiter multi-valued fields. Default to using a pipe (|) delimiter.
-c,--name <NAME> Name of the specific collection to execute expression on if the execution is set
to 'remote'. Required for 'remote' execution environment.
--delimiter <CHARACTER> The output delimiter. Default to using three spaces.
-e,--execution <ENVIRONMENT> Execution environment is either 'local' (i.e CLI process) or via a 'remote' Solr
server. Default environment is 'remote'.
-f,--fields <FIELDS> The fields in the tuples to output. Defaults to fields in the first tuple of result
set.
-h,--help Print this message.
--header Specify to include a header line.
-s,--solr-url <HOST> Base Solr URL, which can be used to determine the zk-host if that's not known;
defaults to: http://localhost:8983.
-u,--credentials <credentials> Credentials in the format username:password. Example: --credentials solr:SolrRocks
-v,--verbose Enable verbose command output.
-z,--zk-host <HOST> Zookeeper connection string; unnecessary if ZK_HOST is defined in solr.in.sh;
otherwise, defaults to localhost:9983.
Examples Using bin/solr stream
There are several ways to use bin/solr stream
.
This section presents several examples.
Executing Expression Locally
Streaming Expressions by default are executed in the Solr cluster. However there are use cases where you want to interact with data in your local environment, or even run a streaming expression independent of Solr.
The Stream Tool allows you to specify --execution local
to process the expression in the Solr CLI’s JVM.
However, "local" processing does not imply a networking sandbox.
Many streaming expressions, such as search
and update
, will make network requests to remote Solr nodes if configured to do so, even in "local" execution mode.
Assuming you have create the file load_data.expr
with the contents:
# Index CSV File update( gettingstarted, parseCSV( cat(./example/exampledocs/books.csv, maxLines=2) ) )
Running this expression will read in the local file and send the first two lines to the collection gettingstarted
.
Want to send data to a remote Solr? pass in --solr-url http://solr.remote:8983 .
|
$ bin/solr stream --execution local --header load_data.expr
The StreamTool adds some Streaming Expressions specifically for local use:
-
stdin() lets you pipe data directly into the streaming expression.
-
cat() that allows you to read ANY file on your local system. This is different from the
cat
operator that runs in Solr that only accesses$SOLR_HOME/userfiles/
.
Caveats:
-
You don’t get to use any of the parallelization support that is available when you run the expression on the cluster.
-
Anything that requires Solr internals access won’t work with the
--execution local
context.