GeoWave Command-Line Interface

Overview

The Command-Line Interface provides a way to execute a multitude of common operations on GeoWave data stores without having to use the Programmatic API. It allows users to manage data stores, indices, statistics, and more. All command options that are marked with * are required for the command to execute.

Configuration

The CLI uses a local configuration file to store sets of data store connection parameters aliased by a store name. Most GeoWave commands ask for a store name and use the configuration file to determine which connection parameters should be used. It also stores connection information for GeoServer, AWS, and HDFS for commands that use those services. This configuration file is generally stored in the user’s home directory, although an alternate configuration file can be specified when running commands.

General Usage

The root of all GeoWave CLI commands is the base geowave command.

$ geowave

This will display a list of all available top-level commands along with a brief description of each.

Version

$ geowave --version

The --version flag will display various information about the installed version of GeoWave, including the version, build arguments, and revision information.

General Flags

These flags can be optionally supplied to any GeoWave command, and should be supplied before the command itself.

Config File

The --config-file flag causes GeoWave to use an alternate configuration file. The supplied file path should include the file name (e.g. --config-file /mnt/config.properties). This can be useful if you have multiple projects that use GeoWave and want to keep the configuration for those data stores separate from each other.

$ geowave --config-file <path_to_file> <command>

Debug

The --debug flag causes all DEBUG, INFO, WARN, and ERROR log events to be output to the console. By default, only WARN and ERROR log events are displayed.

$ geowave --debug <command>

Help Command

Adding help before any CLI command will show that command’s options and their defaults.

$ geowave help <command>

For example, using the help command on index add would result in the following output:

$ geowave help index add
Usage: geowave index add [options] <store name> <index name>
  Options:
    -np, --numPartitions
       The number of partitions.  Default partitions will be 1.
       Default: 1
    -ps, --partitionStrategy
       The partition strategy to use.  Default will be none.
       Default: NONE
       Possible Values: [NONE, HASH, ROUND_ROBIN]
  * -t, --type
       The type of index, such as spatial, or spatial_temporal

Explain Command

The explain command is similar to the help command in it’s usage, but shows all options, including hidden ones. It can be a great way to make sure your parameters are correct before issuing a command.

$ geowave explain <command>

For example, if you wanted to add a spatial index to a store named test-store but weren’t sure what all of the options available to you were, you could do the following:

$ geowave explain index add -t spatial test-store spatial-idx
Command: geowave [options] <subcommand> ...

                VALUE  NEEDED  PARAMETER NAMES
----------------------------------------------
{                    }         -cf, --config-file,
{                    }         --debug,
{                    }         --version,

Command: add [options]

                VALUE  NEEDED  PARAMETER NAMES
----------------------------------------------
{           EPSG:4326}         -c, --crs,
{               false}         -fp, --fullGeometryPrecision,
{                   7}         -gp, --geometryPrecision,
{                   1}         -np, --numPartitions,
{                NONE}         -ps, --partitionStrategy,
{               false}         --storeTime,
{             spatial}         -t, --type,

Expects: <store name> <index name>
Specified:
test-store spatial-idx

The output is broken down into two sections. The first section shows all of the options available on the geowave command. If you wanted to use any of these options, they would need to be specified before index add. The second section shows all of the options available on the index add command. Some commands contain options that, when specified, may reveal more options. In this case, the -t spatial option has revealed some additional configuration options that we could apply to the spatial index. Another command where this is useful is the store add command, where each data store type specified by the -t <store_type> option has a different set of configuration options.

Config Commands

Commands that affect the local GeoWave configuration.

Configure AWS

NAME

geowave-config-aws - configure GeoWave CLI for AWS S3 connections

SYNOPSIS

geowave config aws <AWS S3 endpoint URL>

DESCRIPTION

This command creates a local configuration for AWS S3 connections that is used by commands that interface with S3.

EXAMPLES

Configure GeoWave to use an S3 bucket on us-west-2 called mybucket:

geowave config aws https://s3.us-west-2.amazonaws.com/mybucket

Configure GeoServer

NAME

geowave-config-geoserver - configure GeoWave CLI to connect to a GeoServer instance

SYNOPSIS

geowave config geoserver [options] <GeoServer URL>

DESCRIPTION

This command creates a local configuration for connecting to GeoServer which is used by geoserver or gs commands.

OPTIONS

-p, --password <password>

GeoServer Password - Can be specified as 'pass:<password>', 'file:<local file containing the password>', 'propfile:<local properties file containing the password>:<property file key>', 'env:<variable containing the pass>', or stdin

-u, --username <username>

GeoServer User

-ws, --workspace <workspace>

GeoServer Default Workspace

SSL CONFIGURATION OPTIONS

--sslKeyManagerAlgorithm <algorithm>

Specify the algorithm to use for the keystore.

--sslKeyManagerProvider <provider>

Specify the key manager factory provider.

--sslKeyPassword <password>

Specify the password to be used to access the server certificate from the specified keystore file. Can be specified as pass:<password>, file:<local file containing the password>, propfile:<local properties file containing the password>:<property file key>, env:<variable containing the pass>, or stdin.

--sslKeyStorePassword <password>

Specify the password to use to access the keystore file. Can be specified as pass:<password>, file:<local file containing the password>, propfile:<local properties file containing the password>:<property file key>, env:<variable containing the pass>, or stdin.

--sslKeyStorePath <path>

Specify the absolute path to where the keystore file is located on system. The keystore contains the server certificate to be loaded.

--sslKeyStoreProvider <provider>

Specify the name of the keystore provider to be used for the server certificate.

--sslKeyStoreType <type>

The type of keystore file to be used for the server certificate.

--sslSecurityProtocol <protocol>

Specify the Transport Layer Security (TLS) protocol to use when connecting to the server. By default, the system will use TLS.

--sslTrustManagerAlgorithm <algorithm>

Specify the algorithm to use for the truststore.

--sslTrustManagerProvider <provider>

Specify the trust manager factory provider.

--sslTrustStorePassword <password>

Specify the password to use to access the truststore file. Can be specified as pass:<password>, file:<local file containing the password>, propfile:<local properties file containing the password>:<property file key>, env:<variable containing the pass>, or stdin.

--sslTrustStorePath <path>

Specify the absolute path to where truststore file is located on system. The truststore file is used to validate client certificates.

--sslTrustStoreProvider <provider>

Specify the name of the truststore provider to be used for the server certificate.

--sslTrustStoreType <type>

Specify the type of key store used for the truststore, i.e. JKS (Java KeyStore).

EXAMPLES

Configure GeoWave to use locally running GeoServer:

geowave config geoserver "http://localhost:8080/geoserver"

Configure GeoWave to use GeoServer running on another host:

geowave config geoserver "${HOSTNAME}:8080"

Configure GeoWave to use a particular workspace on a GeoServer instance:

geowave config geoserver -ws myWorkspace "http://localhost:8080/geoserver"

Configure HDFS

NAME

geowave-config-hdfs - configure the GeoWave CLI to connect to HDFS

SYNOPSIS

geowave config hdfs <HDFS DefaultFS URL>

DESCRIPTION

This command creates a local configuration for HDFS connections, which is used by commands that interface with HDFS.

EXAMPLES

Configure GeoWave to use locally running HDFS:

geowave config hdfs localhost:8020

List Configured Properties

NAME

geowave-config-list - list all configured properties

SYNOPSIS

geowave config list [options]

DESCRIPTION

This command will list all properties in the local configuration. This list can be filtered with a regular expression using the -f or --filter options. A useful regular expression might be a store name, to see all of the configured properties for a particular data store.

OPTIONS

-f, --filter <regex>

Filter list by a regular expression.

EXAMPLES

List all configuration properties:

geowave config list

List all configuration properties on a data store called example:

geowave config list -f example

Configure Cryptography Key

NAME

geowave-config-newcryptokey - generate a new security cryptography key for use with configuration properties

SYNOPSIS

geowave config newcryptokey

DESCRIPTION

This command will generate a new security cryptography key for use with configuration properties. This is primarily used if there is a need to re-encrypt the local configurations based on a new security token, should the old one have been compromised.

EXAMPLES

Generate a new cryptography key:

geowave config newcryptokey

Set Configuration Property

NAME

geowave-config-set - sets a property in the local configuration

SYNOPSIS

geowave config set [options] <name> <value>

DESCRIPTION

This command sets a property in the local configuration. This can be used to update a particular configured property of a data store.

OPTIONS

--password

Specify that the value being set is a password and should be encrypted in the configuration.

EXAMPLES

Update the batch write size of a RocksDB data store named example:

geowave config set store.example.opts.batchWriteSize 1000

Update the password for an Accumulo data store named example:

geowave config set --password store.example.opts.password someNewPassword

Store Commands

Commands for managing GeoWave data stores.

Add Store

NAME

geowave-store-add - Add a data store to the GeoWave configuration

SYNOPSIS

geowave store add [options] <name>

DESCRIPTION

This command adds a new store to the GeoWave configuration. The store name can then be used by other commands for interfacing with the configured data store.

OPTIONS

-d, --default

Make this the default store in all operations

*-t, --type <arg>

The type of store. A list of available store types can be found using the store listplugins command.

All core data stores have these options:

--gwNamespace <namespace>

The GeoWave namespace. By default, no namespace is used.

--enableServerSideLibrary <enabled>

Enable server-side operations if possible. Default is true.

--enableSecondaryIndexing

If specified, secondary indexing will be used.

--enableVisibility <enabled>

If specified, visibility will be explicitly enabled or disabled. Default is unspecified.

--maxRangeDecomposition <count>

The maximum number of ranges to use when breaking down queries.

--aggregationMaxRangeDecomposition <count>

The maximum number of ranges to use when breaking down aggregation queries.

When the accumulo type option is used, additional options are:

* -i, --instance <instance>

The Accumulo instance ID.

-u, --user <user>

A valid Accumulo user ID. If not given and using SASL, the active Kerberos user will be used.

-k, --keytab <keytab>

Path to keytab file for Kerberos authentication. If using SASL, this is required.

--sasl <sasl>

Use SASL to connect to Accumulo (Kerberos).

-p, --password <password>

The password for the user. Can be specified as pass:<password>, file:<local file containing the password>, propfile:<local properties file containing the password>:<property file key>, env:<variable containing the pass>, or stdin.

*-z, --zookeeper <servers>

A comma-separated list of Zookeeper servers that an Accumulo instance is using.

When the hbase type option is used, additional options are:

* -z, --zookeeper <servers>

A comma-separated list of zookeeper servers that an HBase instance is using.

--coprocessorJar <path>

Path (HDFS URL) to the JAR containing coprocessor classes.

--disableVerifyCoprocessors

If specified, disable coprocessor verification, which ensures that coprocessors have been added to the HBase table prior to executing server-side operations.

--scanCacheSize <size>

The number of rows passed to each scanner (higher values will enable faster scanners, but will use more memory).

When the redis type option is used, additional options are:

* -a, --address <address>

The address to connect to, such as redis://127.0.0.1:6379.

--compression <compression>

The compression to use. Possible values are snappy, lz4, and none. Default is snappy.

--serialization <serialization>

Can be \"fst\" or \"jdk\". Defaults to fst. This serialization codec is only used for the data index when secondary indexing.

When the rocksdb type option is used, additional options are:

--dir <path>

The directory to read/write to. Defaults to "rocksdb" in the working directory.

--compactOnWrite <enabled>

Whether to compact on every write, if false it will only compact on merge. Default is true.

--batchWriteSize <count>

The size (in records) for each batched write. Anything ⇐ 1 will use synchronous single record writes without batching. Default is 1000.

When the filesystem type option is used, additional options are:

--dir <path>

The directory to read/write to. Defaults to "geowave" in the working directory.

--format <format>

Optionally use a formatter configured with Java SPI of type org.locationtech.geowave.datastore.filesystem.FileSystemDataFormatterSpi. Defaults to 'binary' which is a compact geowave serialization. Use geowave util filesystem listformats to see available formats.

When the cassandra type option is used, additional options are:

* --contactPoints <contact points>

A single contact point or a comma delimited set of contact points to connect to the Cassandra cluster.

--replicas <count>

The number of replicas to use when creating a new keyspace. Default is 3.

--durableWrites <enabled>

Whether to write to commit log for durability, configured only on creation of new keyspace. Default is true.

--batchWriteSize <count>

The number of inserts in a batch write. Default is 50.

When the dynamodb type option is used, additional options are:

* --endpoint <endpoint>

[REQUIRED (or -region)] The endpoint to connect to.

* --region <region>

[REQUIRED (or -endpoint)] The AWS region to use.

--initialWriteCapacity <count>

The maximum number of writes consumed per second before throttling occurs. Default is 5.

--initialReadCapacity <count>

The maximum number of strongly consistent reads consumed per second before throttling occurs. Default is 5.

--maxConnections <count>

The maximum number of open http(s) connections active at any given time. Default is 50.

--protocol <protocol>

The protocol to use. Possible values are HTTP or HTTPS, default is HTTPS.

--cacheResponseMetadata <enabled>

Whether to cache responses from AWS. High performance systems can disable this but debugging will be more difficult. Default is true.

When the kudu type option is used, additional options are:

* --kuduMaster <url>

A URL for the Kudu master node.

When the bigtable type option is used, additional options are:

--projectId <project>

The Bigtable project to connect to. Default is geowave-bigtable-project-id.

--instanceId <instance>

The Bigtable instance to connect to. Default is geowave-bigtable-instance-id.

--scanCacheSize <size>

The number of rows passed to each scanner (higher values will enable faster scanners, but will use more memory).

EXAMPLES

Add a data store called example that uses a locally running Accumulo instance:

geowave store add -t accumulo --zookeeper localhost:2181 --instance accumulo --user root --password secret example

Add a data store called example that uses a locally running HBase instance:

geowave store add -t hbase --zookeeper localhost:2181 example

Add a data store called example that uses a RocksDB database in the current directory:

geowave store add -t rocksdb example

Describe Store

NAME

geowave-store-describe - List properties of a data store

SYNOPSIS

geowave store describe <store name>

DESCRIPTION

This command displays all configuration properties of a given GeoWave data store.

EXAMPLES

List all configuration properties of the example data store:

geowave store describe example

Clear Store

NAME

geowave-store-clear - Clear ALL data from a GeoWave data store and delete tables

SYNOPSIS

geowave store clear <store name>

DESCRIPTION

This command clears ALL data from a GeoWave store and deletes tables.

EXAMPLES

Clear all data from the example data store:

geowave store clear example

Copy Store

NAME

geowave-store-copy - Copy a data store

SYNOPSIS

geowave store copy <input store name> <output store name>

DESCRIPTION

This command copies all of the data from one data store to another.

EXAMPLES

Copy all data from the example data store to the example_copy data store:

geowave store copy example example_copy

Copy Store with MapReduce

NAME

geowave-store-copymr - Copy a data store using MapReduce

SYNOPSIS

geowave store copymr [options] <input store name> <output store name>

DESCRIPTION

This command copies all of the data from one data store to another using MapReduce.

OPTIONS

* --hdfsHostPort <host>

The HDFS host and port.

* --jobSubmissionHostPort <host>

The job submission tracker host and port.

--maxSplits <count>

The maximum partitions for the input data.

--minSplits <count>

The minimum partitions for the input data.

--numReducers <count>

Number of threads writing at a time. Default is 8.

EXAMPLES

Copy all data from the example data store to the example_copy data store using MapReduce:

geowave store copymr --hdfsHostPort localhost:53000 --jobSubmissionHostPort localhost:8032 example example_copy

Copy Store Configuration

NAME

geowave-store-copycfg - Copy and modify existing store configuration

SYNOPSIS

geowave store copycfg [options] <name> <new name> [option_overrides]

DESCRIPTION

This command copies and modifies an existing GeoWave store. It is possible to override configuration options as you copy by specifying the options after the new name, such as store copycfg old new --gwNamespace new_namespace. It is important to note that this command does not copy data, only the data store configuration.

OPTIONS

-d, --default

Makes this the default store in all operations.

EXAMPLES

Copy the example RocksDB data store configuration to example_alt, but with an alternate directory:

geowave store copycfg example example_alt --dir /alternate/directory

List Stores

NAME

geowave-store-list - List all configured data stores

SYNOPSIS

geowave store list

DESCRIPTION

This command displays all configured data stores and their types.

EXAMPLES

List all data stores:

geowave store list

Remove Store

NAME

geowave-store-rm - Removes an existing store from the GeoWave configuration

SYNOPSIS

geowave store rm <store name>

DESCRIPTION

This command removes an existing store from the GeoWave configuration. It does not remove any data from that store.

EXAMPLES

Remove the example store from the configuration:

geowave store rm example

Store Version

NAME

geowave-store-version - Get the version of GeoWave used by a data store

SYNOPSIS

geowave store version <store name>

DESCRIPTION

This command returns the version of GeoWave used by a data store. This is usually the version represented by the server-side libraries being used by the data store.

EXAMPLES

Get the version of GeoWave used by the example data store:

geowave store version example

List Store Plugins

NAME

geowave-store-listplugins - List all available store types

SYNOPSIS

geowave store listplugins

DESCRIPTION

This command lists all of the store types that can be added via the store add command.

EXAMPLES

List all store plugins:

geowave store listplugins

Index Commands

Commands for managing GeoWave indices.

Add Index

NAME

geowave-index-add - Add an index to a data store

SYNOPSIS

geowave index add [options] <store name> <index name>

DESCRIPTION

This command creates an index in a data store if it does not already exist.

OPTIONS

-np, --numPartitions <count>

The number of partitions. Default is 1.

-ps, --partitionStrategy <strategy>

The partition strategy to use. Possible values are NONE, HASH, and ROUND_ROBIN, default is NONE.

* -t, --type <type>

The type of index, such as spatial, temporal, or spatial_temporal

When the spatial type option is used, additional options are:

-c --crs <crs>

The native Coordinate Reference System used within the index. All spatial data will be projected into this CRS for appropriate indexing as needed. Default is EPSG:4326.

-fp, --fullGeometryPrecision

If specified, geometry will be encoded losslessly. Uses more disk space.

-gp, --geometryPrecision <precision>

The maximum precision of the geometry when encoding. Lower precision will save more disk space when encoding. Possible values are between -8 and 7, default is 7.

--storeTime

If specified, the index will store temporal values. This allows it to slightly more efficiently run spatial-temporal queries although if spatial-temporal queries are a common use case, a separate spatial-temporal index is recommended.

When the spatial_temporal type option is used, additional options are:

-c --crs <crs>

The native Coordinate Reference System used within the index. All spatial data will be projected into this CRS for appropriate indexing as needed. Default is EPSG:4326.

-fp, --fullGeometryPrecision

If specified, geometry will be encoded losslessly. Uses more disk space.

-gp, --geometryPrecision <precision>

The maximum precision of the geometry when encoding. Lower precision will save more disk space when encoding. Possible values are between -8 and 7, default is 7.

--bias <bias>

The bias of the spatial-temporal index. There can be more precision given to time or space if necessary. Possible values are TEMPORAL, BALANCED, and SPATIAL, default is BALANCED.

--maxDuplicates <count>

The max number of duplicates per dimension range. The default is 2 per range (for example lines and polygon timestamp data would be up to 4 because it is 2 dimensions, and line/poly time range data would be 8).

--period <periodicity>

The periodicity of the temporal dimension. Because time is continuous, it is binned at this interval. Possible values are MINUTE, HOUR, DAY, WEEK, MONTH, YEAR, and DECADE, default is YEAR.

When the temporal type option is used, additional options are:

--maxDuplicates <count>

The max number of duplicates per dimension range. The default is 2 per range (for example lines and polygon timestamp data would be up to 4 because it is 2 dimensions, and line/poly time range data would be 8).

--period <periodicity>

The periodicity of the temporal dimension. Because time is continuous, it is binned at this interval. Possible values are MINUTE, HOUR, DAY, WEEK, MONTH, YEAR, and DECADE, default is YEAR.

--noTimeRange

If specified, the index will not support time ranges, which can be more efficient.

EXAMPLES

Add a spatial index called spatial_idx with CRS EPSG:3857 to the example data store:

geowave index add -t spatial -c EPSG:3857 example spatial_idx

Add a spatial-temporal index called st_idx with a periodicity of MONTH to the example data store:

geowave index add -t spatial_temporal --period MONTH example st_idx

Compact Index

NAME

geowave-index-compact - Compact all rows for a given index

SYNOPSIS

geowave index compact <store name> <index name>

DESCRIPTION

This command will allow a user to compact all rows for a given index.

EXAMPLES

Compact all rows on the spatial_idx index in the example store:

geowave index compact example spatial_idx

List Indices

NAME

geowave-index-list - Display all indices in a data store

SYNOPSIS

geowave index list <store name>

DESCRIPTION

This command displays all indices in a data store.

EXAMPLES

Display all indices in the example store:

geowave index list example

Remove Index

NAME

geowave-index-rm - Remove an index and all associated data from a data store

SYNOPSIS

geowave index rm <store name> <index name>

DESCRIPTION

This command removes an index and all of its data from a data store.

EXAMPLES

Remove the spatial_idx index from the example store:

geowave index rm example spatial_idx

List Index Plugins

NAME

geowave-index-listplugins - List all available index types

SYNOPSIS

geowave index listplugins

DESCRIPTION

This command lists all of the index types that can be added via the index add command.

EXAMPLES

List all index plugins:

geowave index listplugins

Type Commands

Commands for managing GeoWave types.

List Types

NAME

geowave-type-list - Display all types in a data store

SYNOPSIS

geowave type list <store name>

DESCRIPTION

This command displays all types in a GeoWave data store.

EXAMPLES

Display all types in the example data store:

geowave type list example

Remove Type

NAME

geowave-type-rm - Remove a type and all associated data from a data store

SYNOPSIS

geowave type rm <store name> <type name>

DESCRIPTION

This command removes a type and all associated data from a GeoWave data store.

EXAMPLES

Remove the hail type from the example data store:

geowave type rm example hail

Describe Type

NAME

geowave-type-describe - List attributes of a type in a data store

SYNOPSIS

geowave type describe <store name> <type name>

DESCRIPTION

This command lists attributes of types in a GeoWave data store. For vector types, each attribute and their class are listed. For raster types, only the tile size is listed.

EXAMPLES

Describe the hail type in the example data store:

geowave type describe example hail

Statistics Commands

Commands to manage GeoWave statistics.

Calculate Stat

NAME

geowave-stat-calc - Calculate a specific statistic in the remote store, given a type name and stat type

SYNOPSIS

geowave stat calc [options] <store name> <type name> <stat type>

DESCRIPTION

This command calculates a specific statistic in the data store, given a type name and statistic type.

OPTIONS

--fieldName

The field name for the statistic, if the statistic is maintained per field.

--auth <authorizations>

The authorizations used for the statistics calculation. By default all authorizations are used.

--json

If specified, output will be formatted in JSON.

EXAMPLES

Calculate the COUNT_DATA statistic on the hail type in the example data store:

geowave stat calc example hail COUNT_DATA

Calculate the numeric range statistic of the AREA attribute of the hail type in the example data store:

geowave stat calc --fieldName AREA example hail FEATURE_NUMERIC_RANGE

List Stats

NAME

geowave-stat-list - Print statistics of a data store to standard output

SYNOPSIS

geowave stat list [options] <store name>

DESCRIPTION

This command prints statistics of a GeoWave data store (and optionally of a single type) to the standard output.

OPTIONS

--typeName <type>

If specified, only statistics for the given type will be displayed.

--auth <authorizations>

The authorizations used for the statistics calculation. By default all authorizations are used.

--json

If specified, output will be formatted in JSON.

EXAMPLES

List all statistics in the example store:

geowave stat list example

List all statistics for the hail type in the example store in JSON format:

geowave stat list --json --typeName hail example

Compact Stats

NAME

geowave-stat-compact - Combine all statistics in a data store

SYNOPSIS

geowave stat compact <store name>

DESCRIPTION

This command combines all statistics in a GeoWave data store, which can make the data store more efficient.

EXAMPLES

Compact all statistics in the example data store:

geowave stat compact example

Recalculate Stats

NAME

geowave-stat-recalc - Recalculate the statistics in a data store

SYNOPSIS

geowave stat recalc [options] <store name>

DESCRIPTION

This command recalculates the statistics of an existing GeoWave data store. If a type name is provided as an options, only the statistics for that type will be recalculated.

OPTIONS

--typeName <type>

If specified, only statistics for the given type will be recalculated.

--auth <authorizations>

The authorizations used for the statistics calculation. By default all authorizations are used.

--json

If specified, output will be formatted in JSON.

EXAMPLES

Recalculate all of the statistics in the example data store:

geowave stat recalc example

Recalculate all of the statistics for the hail type in the example data store:

geowave stat recalc --typeName hail example

Remove Stat

NAME

geowave-stat-rm - Remove a statistic from a data store

SYNOPSIS

geowave stat rm [options] <store name> <type name> <stat type>

DESCRIPTION

This command removes a statistic from a GeoWave data store.

OPTIONS

--fieldName

The field name for the statistic, if the statistic is maintained per field.

--auth <authorizations>

The authorizations used for the statistics calculation. By default all authorizations are used.

--json

If specified, output will be formatted in JSON.

EXAMPLES

Remove the BOUNDING_BOX statistic of the hail type in the example data store:

geowave stat rm example hail BOUNDING_BOX

Ingest Commands

Commands that ingest data directly into GeoWave or stage data to be ingested into GeoWave.

Ingest Local to GeoWave

NAME

geowave-ingest-localToGW - Ingest supported files from the local file system

SYNOPSIS

geowave ingest localToGW [options] <file or directory> <store name> <comma delimited index list>

DESCRIPTION

This command runs the ingest code (parse to features, load features to GeoWave) against local file system content.

OPTIONS

-t, --threads <count>

Number of threads to use for ingest. Default is 1.

-x, --extension <extensions>

Individual or comma-delimited set of file extensions to accept.

-f, --formats <formats>

Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.

-v, --visibility <visibility>

The visibility of the data ingested. Default is public.

When the avro format is used, additional options are:

--avro.avro

If specified, indicates that the operation should use Avro feature serialization.

--avro.cql <filter>

An optional CQL filter. If specified, only data matching the filter will be ingested.

--avro.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.

--avro.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--avro.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--avro.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the gdelt format is used, additional options are:

--gdelt.avro

A flag to indicate whether Avro feature serialization should be used.

--gdelt.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gdelt.extended

A flag to indicate whether extended data format should be used.

--gdelt.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gdelt.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gdelt.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gdelt.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geolife format is used, additional options are:

--geolife.avro

A flag to indicate whether Avro feature serialization should be used.

--geolife.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geolife.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.

--geolife.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--geolife.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--geolife.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geotools-raster format is used, additional options are:

--geotools-raster.coverage <name>

Coverage name for the raster. Default is the name of the file.

--geotools-raster.crs <crs>

A CRS override for the provided raster file.

--geotools-raster.histogram

If specified, build a histogram of samples per band on ingest for performing band equalization.

--geotools-raster.mergeStrategy <strategy>

The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over the previous tiles, except where there are no data values. By default none is used.

--geotools-raster.nodata <value>

Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each band can have multiple differing no data values if needed.

--geotools-raster.pyramid

If specified, build an image pyramid on ingest for quick reduced resolution query.

--geotools-raster.separateBands

If specified, separate each band into its own coverage name. By default the coverage name will have _Bn appended to it where n is the band’s index.

--geotools-raster.tileSize <size>

The tile size of stored tiles. Default is 256.

When the geotools-vector format is used, additional options are:

--geotools-vector.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geotools-vector.data <fields>

A map of date field names to the date format of the file. Use commas to separate each entry, then the first : character will separate the field name from the format. Use \, to include a comma in the format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as dates with different formats.

--geotools-vector.type <types>

Optional parameter that specifies specific type name(s) from the source file.

When the gpx format is used, additional options are:

--gpx.avro

A flag to indicate whether Avro feature serialization should be used.

--gpx.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gpx.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gpx.maxLength <degrees>

Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.

--gpx.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gpx.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gpx.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the tdrive format is used, additional options are:

--tdrive.avro

A flag to indicate whether Avro feature serialization should be used.

--tdrive.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--tdrive.typename <types>

A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--tdrive.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--tdrive.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--tdrive.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the twitter format is used, additional options are:

--twitter.avro

A flag to indicate whether Avro feature serialization should be used.

--twitter.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--twitter.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--twitter.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--twitter.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--twitter.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

EXAMPLES

Ingest GDELT data from an area around Germany from the gdelt_data directory into a GeoWave data store called example in the spatial-idx index:

geowave ingest localToGW -f gdelt --gdelt.cql "BBOX(geometry,5.87,47.2,15.04,54.95)" ./gdelt_data example spatial-idx

Ingest a shapefile called states.shp into the example data store in the spatial-idx index:

geowave ingest localToGW -f geotools-vector states.shp example spatial-idx

Ingest Kafka to GeoWave

NAME

geowave-ingest-kafkaToGW - Subscribe to a Kafka topic and ingest into GeoWave

SYNOPSIS

geowave ingest kafkaToGW [options] <store name> <comma delimited index list>

DESCRIPTION

This command ingests data from a Kafka topic into GeoWave.

OPTIONS

--autoOffsetReset <offset>

What to do when there is no initial offset in ZooKeeper or if an offset is out of range. If smallest is used, automatically reset the offset to the smallest offset. If largest is used, automatically reset the offset to the largest offset. Otherwise, throw an exception to the consumer.

--batchSize <size>

The data will automatically flush after this number of entries. Default is 10,000.

--consumerTimeoutMs <timeout>

By default, this value is -1 and a consumer blocks indefinitely if no new message is available for consumption. By setting the value to a positive integer, a timeout exception is thrown to the consumer if no message is available for consumption after the specified timeout value.

--fetchMessageMaxBytes <bytes>

The number of bytes of messages to attempt to fetch for each topic-partition in each fetch request. These bytes will be read into memory for each partition, so this helps control the memory used by the consumer. The fetch request size must be at least as large as the maximum message size the server allows or else it is possible for the producer to send messages larger than the consumer can fetch.

--groupId <id>

A string that uniquely identifies the group of consumer processes to which this consumer belongs. By setting the same group id multiple processes indicate that they are all part of the same consumer group.

* --kafkaprops <file>

Properties file containing Kafka properties.

--reconnectOnTimeout

If specified, when the consumer timeout occurs (based on the kafka property consumer.timeout.ms), a flush will occur and immediately reconnect.

-x, --extension <extensions>

Individual or comma-delimited set of file extensions to accept.

-f, --formats <formats>

Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.

-v, --visibility <visibility>

The visibility of the data ingested. Default is public.

--zookeeperConnect <host>

Specifies the ZooKeeper connection string in the form hostname:port where host and port are the host and port of a ZooKeeper server. To allow connecting through other ZooKeeper nodes when that ZooKeeper machine is down you can also specify multiple hosts in the form hostname1:port1,hostname2:port2,hostname3:port3.

When the avro format is used, additional options are:

--avro.avro

If specified, indicates that the operation should use Avro feature serialization.

--avro.cql <filter>

An optional CQL filter. If specified, only data matching the filter will be ingested.

--avro.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.

--avro.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--avro.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--avro.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the gdelt format is used, additional options are:

--gdelt.avro

A flag to indicate whether Avro feature serialization should be used.

--gdelt.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gdelt.extended

A flag to indicate whether extended data format should be used.

--gdelt.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gdelt.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gdelt.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gdelt.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geolife format is used, additional options are:

--geolife.avro

A flag to indicate whether Avro feature serialization should be used.

--geolife.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geolife.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.

--geolife.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--geolife.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--geolife.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geotools-raster format is used, additional options are:

--geotools-raster.coverage <name>

Coverage name for the raster. Default is the name of the file.

--geotools-raster.crs <crs>

A CRS override for the provided raster file.

--geotools-raster.histogram

If specified, build a histogram of samples per band on ingest for performing band equalization.

--geotools-raster.mergeStrategy <strategy>

The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over the previous tiles, except where there are no data values. By default none is used.

--geotools-raster.nodata <value>

Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each band can have multiple differing no data values if needed.

--geotools-raster.pyramid

If specified, build an image pyramid on ingest for quick reduced resolution query.

--geotools-raster.separateBands

If specified, separate each band into its own coverage name. By default the coverage name will have _Bn appended to it where n is the band’s index.

--geotools-raster.tileSize <size>

The tile size of stored tiles. Default is 256.

When the geotools-vector format is used, additional options are:

--geotools-vector.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geotools-vector.data <fields>

A map of date field names to the date format of the file. Use commas to separate each entry, then the first : character will separate the field name from the format. Use \, to include a comma in the format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as dates with different formats.

--geotools-vector.type <types>

Optional parameter that specifies specific type name(s) from the source file.

When the gpx format is used, additional options are:

--gpx.avro

A flag to indicate whether Avro feature serialization should be used.

--gpx.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gpx.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gpx.maxLength <degrees>

Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.

--gpx.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gpx.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gpx.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the tdrive format is used, additional options are:

--tdrive.avro

A flag to indicate whether Avro feature serialization should be used.

--tdrive.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--tdrive.typename <types>

A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--tdrive.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--tdrive.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--tdrive.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the twitter format is used, additional options are:

--twitter.avro

A flag to indicate whether Avro feature serialization should be used.

--twitter.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--twitter.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--twitter.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--twitter.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--twitter.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

Stage Local to HDFS

NAME

geowave-ingest-localToHdfs - Stage supported files in local file system to HDFS

SYNOPSIS

geowave ingest localToHdfs [options] <file or directory> <hdfs host:port> <path to base directory to write to>

DESCRIPTION

This command stages supported files in the local file system to HDFS.

OPTIONS

-x, --extension <extensions>

Individual or comma-delimited set of file extensions to accept.

-f, --formats <formats>

Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.

When the avro format is used, additional options are:

--avro.avro

If specified, indicates that the operation should use Avro feature serialization.

--avro.cql <filter>

An optional CQL filter. If specified, only data matching the filter will be ingested.

--avro.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.

--avro.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--avro.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--avro.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the gdelt format is used, additional options are:

--gdelt.avro

A flag to indicate whether Avro feature serialization should be used.

--gdelt.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gdelt.extended

A flag to indicate whether extended data format should be used.

--gdelt.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gdelt.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gdelt.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gdelt.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geolife format is used, additional options are:

--geolife.avro

A flag to indicate whether Avro feature serialization should be used.

--geolife.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geolife.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.

--geolife.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--geolife.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--geolife.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geotools-raster format is used, additional options are:

--geotools-raster.coverage <name>

Coverage name for the raster. Default is the name of the file.

--geotools-raster.crs <crs>

A CRS override for the provided raster file.

--geotools-raster.histogram

If specified, build a histogram of samples per band on ingest for performing band equalization.

--geotools-raster.mergeStrategy <strategy>

The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over the previous tiles, except where there are no data values. By default none is used.

--geotools-raster.nodata <value>

Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each band can have multiple differing no data values if needed.

--geotools-raster.pyramid

If specified, build an image pyramid on ingest for quick reduced resolution query.

--geotools-raster.separateBands

If specified, separate each band into its own coverage name. By default the coverage name will have _Bn appended to it where n is the band’s index.

--geotools-raster.tileSize <size>

The tile size of stored tiles. Default is 256.

When the geotools-vector format is used, additional options are:

--geotools-vector.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geotools-vector.data <fields>

A map of date field names to the date format of the file. Use commas to separate each entry, then the first : character will separate the field name from the format. Use \, to include a comma in the format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as dates with different formats.

--geotools-vector.type <types>

Optional parameter that specifies specific type name(s) from the source file.

When the gpx format is used, additional options are:

--gpx.avro

A flag to indicate whether Avro feature serialization should be used.

--gpx.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gpx.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gpx.maxLength <degrees>

Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.

--gpx.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gpx.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gpx.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the tdrive format is used, additional options are:

--tdrive.avro

A flag to indicate whether Avro feature serialization should be used.

--tdrive.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--tdrive.typename <types>

A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--tdrive.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--tdrive.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--tdrive.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the twitter format is used, additional options are:

--twitter.avro

A flag to indicate whether Avro feature serialization should be used.

--twitter.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--twitter.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--twitter.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--twitter.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--twitter.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

Stage Local to Kafka

NAME

geowave-ingest-localToKafka - Stage supported files in local file system to a Kafka topic

SYNOPSIS

geowave ingest localToKafka [options] <file or directory>

DESCRIPTION

This command stages supported files in the local file system to a Kafka topic.

OPTIONS

* --kafkaprops <file>

Properties file containing Kafka properties

--metadataBrokerList <brokers>

This is for bootstrapping and the producer will only use it for getting metadata (topics, partitions and replicas). The socket connections for sending the actual data will be established based on the broker information returned in the metadata. The format is host1:port1,host2:port2, and the list can be a subset of brokers or a VIP pointing to a subset of brokers.

--producerType <type>

This parameter specifies whether the messages are sent asynchronously in a background thread. Valid values are async for asynchronous send and sync for synchronous send. By setting the producer to async we allow batching together of requests (which is great for throughput) but open the possibility of a failure of the client machine dropping unsent data.

--requestRequiredAcks <count>

This value controls when a produce request is considered completed. Specifically, how many other brokers must have committed the data to their log and acknowledged this to the leader?

--retryBackoffMs <time>

The amount of time to wait before attempting to retry a failed produce request to a given topic partition. This avoids repeated sending-and-failing in a tight loop.

--serializerClass <class>

The serializer class for messages. The default encoder takes a byte[] and returns the same byte[].

-x, --extension <extensions>

Individual or comma-delimited set of file extensions to accept.

-f, --formats <formats>

Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.

When the avro format is used, additional options are:

--avro.avro

If specified, indicates that the operation should use Avro feature serialization.

--avro.cql <filter>

An optional CQL filter. If specified, only data matching the filter will be ingested.

--avro.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.

--avro.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--avro.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--avro.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the gdelt format is used, additional options are:

--gdelt.avro

A flag to indicate whether Avro feature serialization should be used.

--gdelt.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gdelt.extended

A flag to indicate whether extended data format should be used.

--gdelt.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gdelt.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gdelt.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gdelt.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geolife format is used, additional options are:

--geolife.avro

A flag to indicate whether Avro feature serialization should be used.

--geolife.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geolife.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.

--geolife.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--geolife.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--geolife.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geotools-raster format is used, additional options are:

--geotools-raster.coverage <name>

Coverage name for the raster. Default is the name of the file.

--geotools-raster.crs <crs>

A CRS override for the provided raster file.

--geotools-raster.histogram

If specified, build a histogram of samples per band on ingest for performing band equalization.

--geotools-raster.mergeStrategy <strategy>

The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over the previous tiles, except where there are no data values. By default none is used.

--geotools-raster.nodata <value>

Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each band can have multiple differing no data values if needed.

--geotools-raster.pyramid

If specified, build an image pyramid on ingest for quick reduced resolution query.

--geotools-raster.separateBands

If specified, separate each band into its own coverage name. By default the coverage name will have _Bn appended to it where n is the band’s index.

--geotools-raster.tileSize <size>

The tile size of stored tiles. Default is 256.

When the geotools-vector format is used, additional options are:

--geotools-vector.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geotools-vector.data <fields>

A map of date field names to the date format of the file. Use commas to separate each entry, then the first : character will separate the field name from the format. Use \, to include a comma in the format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as dates with different formats.

--geotools-vector.type <types>

Optional parameter that specifies specific type name(s) from the source file.

When the gpx format is used, additional options are:

--gpx.avro

A flag to indicate whether Avro feature serialization should be used.

--gpx.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gpx.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gpx.maxLength <degrees>

Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.

--gpx.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gpx.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gpx.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the tdrive format is used, additional options are:

--tdrive.avro

A flag to indicate whether Avro feature serialization should be used.

--tdrive.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--tdrive.typename <types>

A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--tdrive.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--tdrive.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--tdrive.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the twitter format is used, additional options are:

--twitter.avro

A flag to indicate whether Avro feature serialization should be used.

--twitter.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--twitter.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--twitter.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--twitter.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--twitter.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

Ingest Local to GeoWave with MapReduce

NAME

geowave-ingest-localToMrGW - Copy supported files from local file system to HDFS and ingest from HDFS

SYNOPSIS

geowave ingest localToMrGW [options] <file or directory> <hdfs host:port> <path to base directory to write to> <store name> <comma delimited index list>

DESCRIPTION

This command copies supported files from local file system to HDFS and then ingests from HDFS.

OPTIONS

--jobtracker <host>

Hadoop job tracker hostname and port in the format hostname:port.

--resourceman <host>

Yarn resource manager hostname and port in the format hostname:port.

-x, --extension <extensions>

Individual or comma-delimited set of file extensions to accept.

-f, --formats <formats>

Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.

-v, --visibility <visibility>

The visibility of the data ingested. Default is public.

When the avro format is used, additional options are:

--avro.avro

If specified, indicates that the operation should use Avro feature serialization.

--avro.cql <filter>

An optional CQL filter. If specified, only data matching the filter will be ingested.

--avro.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.

--avro.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--avro.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--avro.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the gdelt format is used, additional options are:

--gdelt.avro

A flag to indicate whether Avro feature serialization should be used.

--gdelt.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gdelt.extended

A flag to indicate whether extended data format should be used.

--gdelt.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gdelt.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gdelt.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gdelt.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geolife format is used, additional options are:

--geolife.avro

A flag to indicate whether Avro feature serialization should be used.

--geolife.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geolife.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.

--geolife.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--geolife.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--geolife.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geotools-raster format is used, additional options are:

--geotools-raster.coverage <name>

Coverage name for the raster. Default is the name of the file.

--geotools-raster.crs <crs>

A CRS override for the provided raster file.

--geotools-raster.histogram

If specified, build a histogram of samples per band on ingest for performing band equalization.

--geotools-raster.mergeStrategy <strategy>

The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over the previous tiles, except where there are no data values. By default none is used.

--geotools-raster.nodata <value>

Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each band can have multiple differing no data values if needed.

--geotools-raster.pyramid

If specified, build an image pyramid on ingest for quick reduced resolution query.

--geotools-raster.separateBands

If specified, separate each band into its own coverage name. By default the coverage name will have _Bn appended to it where n is the band’s index.

--geotools-raster.tileSize <size>

The tile size of stored tiles. Default is 256.

When the geotools-vector format is used, additional options are:

--geotools-vector.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geotools-vector.data <fields>

A map of date field names to the date format of the file. Use commas to separate each entry, then the first : character will separate the field name from the format. Use \, to include a comma in the format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as dates with different formats.

--geotools-vector.type <types>

Optional parameter that specifies specific type name(s) from the source file.

When the gpx format is used, additional options are:

--gpx.avro

A flag to indicate whether Avro feature serialization should be used.

--gpx.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gpx.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gpx.maxLength <degrees>

Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.

--gpx.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gpx.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gpx.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the tdrive format is used, additional options are:

--tdrive.avro

A flag to indicate whether Avro feature serialization should be used.

--tdrive.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--tdrive.typename <types>

A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--tdrive.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--tdrive.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--tdrive.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the twitter format is used, additional options are:

--twitter.avro

A flag to indicate whether Avro feature serialization should be used.

--twitter.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--twitter.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--twitter.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--twitter.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--twitter.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

Ingest MapReduce to GeoWave

NAME

geowave-ingest-mrToGW - Ingest supported files that already exist in HDFS

SYNOPSIS

geowave ingest mrToGW [options] <hdfs host:port> <path to base directory to write to> <store name> <comma delimited index list>

DESCRIPTION

This command ingests supported files that already exist in HDFS to GeoWave.

OPTIONS

--jobtracker <host>

Hadoop job tracker hostname and port in the format hostname:port.

--resourceman <host>

Yarn resource manager hostname and port in the format hostname:port.

-x, --extension <extensions>

Individual or comma-delimited set of file extensions to accept.

-f, --formats <formats>

Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.

-v, --visibility <visibility>

The visibility of the data ingested. Default is public.

When the avro format is used, additional options are:

--avro.avro

If specified, indicates that the operation should use Avro feature serialization.

--avro.cql <filter>

An optional CQL filter. If specified, only data matching the filter will be ingested.

--avro.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.

--avro.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--avro.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--avro.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the gdelt format is used, additional options are:

--gdelt.avro

A flag to indicate whether Avro feature serialization should be used.

--gdelt.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gdelt.extended

A flag to indicate whether extended data format should be used.

--gdelt.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gdelt.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gdelt.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gdelt.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geolife format is used, additional options are:

--geolife.avro

A flag to indicate whether Avro feature serialization should be used.

--geolife.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geolife.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.

--geolife.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--geolife.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--geolife.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the geotools-raster format is used, additional options are:

--geotools-raster.coverage <name>

Coverage name for the raster. Default is the name of the file.

--geotools-raster.crs <crs>

A CRS override for the provided raster file.

--geotools-raster.histogram

If specified, build a histogram of samples per band on ingest for performing band equalization.

--geotools-raster.mergeStrategy <strategy>

The tile merge strategy to use for mosaic. Specifying no-data will mosaic the most recent tile over the previous tiles, except where there are no data values. By default none is used.

--geotools-raster.nodata <value>

Optional parameter to set no data values, if 1 value is giving it is applied for each band, if multiple are given then the first totalNoDataValues/totalBands are applied to the first band and so on, so each band can have multiple differing no data values if needed.

--geotools-raster.pyramid

If specified, build an image pyramid on ingest for quick reduced resolution query.

--geotools-raster.separateBands

If specified, separate each band into its own coverage name. By default the coverage name will have _Bn appended to it where n is the band’s index.

--geotools-raster.tileSize <size>

The tile size of stored tiles. Default is 256.

When the geotools-vector format is used, additional options are:

--geotools-vector.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--geotools-vector.data <fields>

A map of date field names to the date format of the file. Use commas to separate each entry, then the first : character will separate the field name from the format. Use \, to include a comma in the format. For example: time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss configures fields time and time2 as dates with different formats.

--geotools-vector.type <types>

Optional parameter that specifies specific type name(s) from the source file.

When the gpx format is used, additional options are:

--gpx.avro

A flag to indicate whether Avro feature serialization should be used.

--gpx.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--gpx.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--gpx.maxLength <degrees>

Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.

--gpx.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--gpx.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--gpx.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the tdrive format is used, additional options are:

--tdrive.avro

A flag to indicate whether Avro feature serialization should be used.

--tdrive.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--tdrive.typename <types>

A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--tdrive.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--tdrive.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--tdrive.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

When the twitter format is used, additional options are:

--twitter.avro

A flag to indicate whether Avro feature serialization should be used.

--twitter.cql <filter>

A CQL filter, only data matching this filter will be ingested.

--twitter.typename <types>

A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.

--twitter.maxVertices <count>

Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.

--twitter.minSimpVertices <count>

Minimum vertex count to qualify for geometry simplification.

--twitter.tolerance <tolerance>

Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.

Ingest Spark to GeoWave

NAME

geowave-ingest-sparkToGW - Ingest supported files that already exist in HDFS or S3 using Spark

SYNOPSIS

geowave ingest sparkToGW [options] <input directory> <store name> <comma delimited index list>

DESCRIPTION

This command ingests supported files that already exist in HDFS or S3 using Spark.

OPTIONS

-ho, --hosts <host>

The spark driver host. Default is localhost.

-m, --master <designation>

The spark master designation. Default is local.

-n, --name <name>

The spark application name. Default is Spark Ingest.

-c, --numcores <count>

The number of cores to use.

-e, --numexecutors <count>

The number of executors to use.

-x, --extension <extensions>

Individual or comma-delimited set of file extensions to accept.

-f, --formats <formats>

Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.

-v, --visibility <visibility>

The visibility of the data ingested. Default is public.

List Ingest Plugins

NAME

geowave-ingest-listplugins - List supported ingest formats

SYNOPSIS

geowave ingest listplugins

DESCRIPTION

This command will list all ingest formats supported by the version of GeoWave being run.

EXAMPLES

List all ingest plugins:

geowave ingest listplugins

Analytic Commands

Commands that run MapReduce or Spark processing to enhance an existing GeoWave dataset.

The commands below can also be run as a Yarn or Hadoop API command (i.e. mapreduce).

For instance, to run the analytic using Yarn:

yarn jar geowave-tools.jar analytic <algorithm> <options> <store>

Density-Based Scan

NAME

geowave-analytic-dbscan - Density-Based Scanner

SYNOPSIS

geowave analytic dbscan [options] <storename>

DESCRIPTION

This command runs a density based scanner analytic on GeoWave data.

OPTIONS

-conf, --mapReduceConfigFile <file>

MapReduce configuration file.

* -hdfsbase, --mapReduceHdfsBaseDir <path>

Fully qualified path to the base directory in HDFS.

* -jobtracker, --mapReduceJobtrackerHostPort <host>

[REQUIRED (or -resourceman)] Hadoop job tracker hostname and port in the format hostname:port.

* -resourceman, --mapReduceYarnResourceManager <host>

[REQUIRED (or -jobtracker)] Yarn resource manager hostname and port in the format hostname:port.

-hdfs, --mapReduceHdfsHostPort <host>

HDFS hostname and port in the format hostname:port.

--cdf, --commonDistanceFunctionClass <class>

Distance function class that implements org.locationtech.geowave.analytics.distance.DistanceFn.

* --query.typeNames <types>

The comma-separated list of types to query; by default all types are used.

--query.auth <auths>

The comma-separated list of authorizations used during extract; by default all authorizations are used.

--query.index <index>

The specific index to query; by default one is chosen for each adapter.

* -emx, --extractMaxInputSplit <size>

Maximum HDFS input split size.

* -emn, --extractMinInputSplit <size>

Minimum HDFS input split size.

-eq, --extractQuery <query>

Query

-ofc, --outputOutputFormat <class>

Output format class.

-ifc, --inputFormatClass <class>

Input format class.

-orc, --outputReducerCount <count>

Number of reducers For output.

* -cmi, --clusteringMaxIterations <count>

Maximum number of iterations when finding optimal clusters.

* -cms, --clusteringMinimumSize <size>

Minimum cluster size.

* -pmd, --partitionMaxDistance <distance>

Maximum partition distance.

-b, --globalBatchId <id>

Batch ID.

-hdt, --hullDataTypeId <id>

Data Type ID for a centroid item.

-hpe, --hullProjectionClass <class>

Class to project on to 2D space. Implements org.locationtech.geowave.analytics.tools.Projection.

-ons, --outputDataNamespaceUri <namespace>

Output namespace for objects that will be written to GeoWave.

-odt, --outputDataTypeId <id>

Output Data ID assigned to objects that will be written to GeoWave.

-oop, --outputHdfsOutputPath <path>

Output HDFS file path.

-oid, --outputIndexId <index>

Output index for objects that will be written to GeoWave.

-pdt, --partitionDistanceThresholds <thresholds>

Comma separated list of distance thresholds, per dimension.

-pdu, --partitionGeometricDistanceUnit <unit>

Geometric distance unit (m=meters,km=kilometers, see symbols for javax.units.BaseUnit).

-pms, --partitionMaxMemberSelection <count>

Maximum number of members selected from a partition.

-pdr, --partitionPartitionDecreaseRate <rate>

Rate of decrease for precision(within (0,1]).

-pp, --partitionPartitionPrecision <precision>

Partition precision.

-pc, --partitionPartitionerClass <class>

Index identifier for centroids.

-psp, --partitionSecondaryPartitionerClass <class>

Perform secondary partitioning with the provided class.

EXAMPLES

Run through 5 max iterations (-cmi), with max distance between points as 10 meters (-cms), min HDFS input split is 2 (-emn), max HDFS input split is 6 (-emx), max search distance is 1000 meters (-pmd), reducer count is 4 (-orc), the HDFS IPC port is localhost:53000 (-hdfs), the yarn job tracker is at localhost:8032 (-jobtracker), the temporary files needed by this job are stored in hdfs:/host:port//user/rwgdrummer (-hdfsbase), the data type used is gpxpoint (-query.typeNames), and the data store connection parameters are loaded from my_store.

geowave analytic dbscan -cmi 5 -cms 10 -emn 2 -emx 6 -pmd 1000 -orc 4 -hdfs localhost:53000 -jobtracker localhost:8032 -hdfsbase /user/rwgdrummer --query.typeNames gpxpoint my_store

EXECUTION

DBSCAN uses GeoWaveInputFormat to load data from GeoWave into HDFS. You can use the extract query parameter to limit the records used in the analytic.

It iteratively calls Nearest Neighbor to execute a sequence of concave hulls. The hulls are saved into sequence files written to a temporary HDFS directory, and then read in again for the next DBSCAN iteration.

After completion, the data is written back from HDFS to Accumulo using a job called the "input load runner".

Kernel Density Estimate

NAME

geowave-analytic-kde - Kernel Density Estimate

SYNOPSIS

geowave analytic kde [options] <input store name> <output store name>

DESCRIPTION

This command runs a Kernel Density Estimate analytic on GeoWave data.

OPTIONS

* --coverageName <name>

The output coverage name.

* --featureType <type>

The name of the feature type to run a KDE on.

* --minLevel <level>

The minimum zoom level to run a KDE at.

* --maxLevel <level>

The maximum zoom level to run a KDE at.

--minSplits <count>

The minimum partitions for the input data.

--maxSplits <count>

The maximum partitions for the input data.

--tileSize <size>

The size of output tiles.

--cqlFilter <filter>

An optional CQL filter applied to the input data.

--indexName <index>

An optional index to filter the input data.

--outputIndex <index>

An optional index for output data store. Only spatial index type is supported.

--hdfsHostPort <host>

The HDFS host and port.

* --jobSubmissionHostPort <host>

The job submission tracker host and port in the format hostname:port.

EXAMPLES

Perform a Kernel Density Estimation using a local resource manager at port 8032 on the gdeltevent type. The KDE should be run at zoom levels 5-26 and that the new raster generated should be under the type name gdeltevent_kde. Finally, the input and output data store is called gdelt.

geowave analytic kde --featureType gdeltevent --jobSubmissionHostPort localhost:8032 --minLevel 5 --maxLevel 26 --coverageName gdeltevent_kde gdelt gdelt

Kernel Density Estimate on Spark

NAME

geowave-analytic-kdespark - Kernel Density Estimate using Spark

SYNOPSIS

geowave analytic kdespark [options] <input store name> <output store name>

DESCRIPTION

This command runs a Kernel Density Estimate analytic on GeoWave data using Apache Spark.

OPTIONS

* --coverageName <name>

The output coverage name.

* --featureType <type>

The name of the feature type to run a KDE on.

* --minLevel <level>

The minimum zoom level to run a KDE at.

* --maxLevel <level>

The maximum zoom level to run a KDE at.

--minSplits <count>

The minimum partitions for the input data.

--maxSplits <count>

The maximum partitions for the input data.

--tileSize <size>

The size of output tiles.

--cqlFilter <filter>

An optional CQL filter applied to the input data.

--indexName <index>

An optional index name to filter the input data.

--outputIndex <index>

An optional index for output data store. Only spatial index type is supported.

-n, --name <name>

The Spark application name.

-ho, --host <host>

The Spark driver host.

-m, --master <designation>

The Spark master designation.

EXAMPLES

Perform a Kernel Density Estimation using a local spark cluster on the gdeltevent type. The KDE should be run at zoom levels 5-26 and that the new raster generated should be under the type name gdeltevent_kde. Finally, the input and output data store is called gdelt.

geowave analytic kdespark --featureType gdeltevent -m local --minLevel 5 --maxLevel 26 --coverageName gdeltevent_kde gdelt gdelt

K-means Jump

NAME

geowave-analytic-kmeansjump - KMeans Clustering using Jump Method

SYNOPSIS

geowave analytic kmeansjump [options] <store name>

DESCRIPTION

This command executes a KMeans Clustering analytic using a Jump Method.

OPTIONS

-conf, --mapReduceConfigFile <file>

MapReduce configuration file.

* -hdfsbase, --mapReduceHdfsBaseDir <path>

Fully qualified path to the base directory in HDFS.

* -jobtracker, --mapReduceJobtrackerHostPort <host>

[REQUIRED (or -resourceman)] Hadoop job tracker hostname and port in the format hostname:port.

* -resourceman, --mapReduceYarnResourceManager <host>

[REQUIRED (or -jobtracker)] Yarn resource manager hostname and port in the format hostname:port.

-hdfs, --mapReduceHdfsHostPort <host>

HDFS hostname and port in the format hostname:port.

--cdf, --commonDistanceFunctionClass <class>

Distance function class that implements org.locationtech.geowave.analytics.distance.DistanceFn.

* --query.typeNames <types>

The comma-separated list of types to query; by default all types are used.

--query.auth <auths>

The comma-separated list of authorizations used during extract; by default all authorizations are used.

--query.index <index>

The specific index to query; by default one is chosen for each adapter.

* -emx, --extractMaxInputSplit <size>

Maximum HDFS input split size.

* -emn, --extractMinInputSplit <size>

Minimum HDFS input split size.

-eq, --extractQuery <query>

Query

-ofc, --outputOutputFormat <class>

Output format class.

-ifc, --inputFormatClass <class>

Input format class.

-orc, --outputReducerCount <count>

Number of reducers For output.

-cce, --centroidExtractorClass <class>

Centroid exractor class that implements org.locationtech.geowave.analytics.extract.CentroidExtractor.

-cid, --centroidIndexId <index>

Index to use for centroids.

-cfc, --centroidWrapperFactoryClass <class>

A factory class that implements org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory.

-czl, --centroidZoomLevel <level>

Zoom level for centroids.

-cct, --clusteringConverganceTolerance <tolerance>

Convergence tolerance.

* -cmi, --clusteringMaxIterations <count>

Maximum number of iterations when finding optimal clusters.

-crc, --clusteringMaxReducerCount <count>

Maximum clustering reducer count.

* -zl, --clusteringZoomLevels <count>

Number of zoom levels to process.

-dde, --commonDimensionExtractClass <class>

Dimension extractor class that implements org.locationtech.geowave.analytics.extract.DimensionExtractor.

-ens, --extractDataNamespaceUri <namespace>

Output data namespace URI.

-ede, --extractDimensionExtractClass <class>

Class to extract dimensions into a simple feature output.

-eot, --extractOutputDataTypeId <type>

Output data type ID.

-erc, --extractReducerCount <count>

Number of reducers For initial data extraction and de-duplication.

-b, --globalBatchId <id>

Batch ID.

-pb, --globalParentBatchId <id>

Parent Batch ID.

-hns, --hullDataNamespaceUri <namespace>

Data type namespace for a centroid item.

-hdt, --hullDataTypeId <type>

Data type ID for a centroid item.

-hid, --hullIndexId <index>

Index to use for centroids.

-hpe, --hullProjectionClass <class>

Class to project on to 2D space. Implements org.locationtech.geowave.analytics.tools.Projection.

-hrc, --hullReducerCount <count>

Centroid reducer count.

-hfc, --hullWrapperFactoryClass <class>

Class to create analytic item to capture hulls. Implements org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory.

* -jkp, --jumpKplusplusMin <value>

The minimum K when K-means parallel takes over sampling.

* -jrc, --jumpRangeOfCentroids <ranges>

Comma-separated range of centroids (e.g. 2,100).

EXAMPLES

The minimum clustering iterations is 15 (-cmi), the zoom level is 1 (-zl), the maximum HDFS input split is 4000 (-emx), the minimum HDFS input split is 100 (-emn), the temporary files needed by this job are stored in hdfs:/host:port/user/rwgdrummer/temp_dir_kmeans (-hdfsbase), the HDFS IPC port is localhost:53000 (-hdfs), the yarn job tracker is at localhost:8032 (-jobtracker), the type used is 'hail' (query.typeNames), the minimum K for K-means parallel sampling is 3 (-jkp), the comma separated range of centroids is 4,8 (-jrc), and the data store parameters are loaded from my_store.

geowave analytic kmeansjump -cmi 15 -zl 1 -emx 4000 -emn 100 -hdfsbase /usr/rwgdrummer/temp_dir_kmeans -hdfs localhost:53000 -jobtracker localhost:8032 --query.typeNames hail -jkp 3 -jrc 4,8 my_store

EXECUTION

KMeansJump uses most of the same parameters from KMeansParallel. It tries every K value given (-jrc) to find the value with least entropy. The other value, jkp, will specify which K values should use K-means parallel for sampling versus a single sampler (which uses a random sample). For instance, if you specify 4,8 for jrc and 6 for jkp, then K=4,5 will use the K-means parallel sampler, while 6,7,8 will use the single sampler.

KMeansJump executes by executing several iterations, running the sampler (described above, which also calls the normal K-means algorithm to determine centroids) and then executing a K-means distortion job, which calculates the entropy of the calculated centroids.

Look at the EXECUTION documentation for the kmeansparallel command for discussion of output, tolerance, and performance variables.

K-means Parallel

NAME

geowave-analytic-kmeansparallel - K-means Parallel Clustering

SYNOPSIS

geowave analytic kmeansparallel [options] <store name>

DESCRIPTION

This command executes a K-means Parallel Clustering analytic.

OPTIONS

-conf, --mapReduceConfigFile <file>

MapReduce configuration file.

* -hdfsbase, --mapReduceHdfsBaseDir <path>

Fully qualified path to the base directory in HDFS.

* -jobtracker, --mapReduceJobtrackerHostPort <host>

[REQUIRED (or -resourceman)] Hadoop job tracker hostname and port in the format hostname:port.

* -resourceman, --mapReduceYarnResourceManager <host>

[REQUIRED (or -jobtracker)] Yarn resource manager hostname and port in the format hostname:port.

-hdfs, --mapReduceHdfsHostPort <host>

HDFS hostname and port in the format hostname:port.

--cdf, --commonDistanceFunctionClass <class>

Distance function class that implements org.locationtech.geowave.analytics.distance.DistanceFn.

* --query.typeNames <types>

The comma-separated list of types to query; by default all types are used.

--query.auth <auths>

The comma-separated list of authorizations used during extract; by default all authorizations are used.

--query.index <index>

The specific index to query; by default one is chosen for each adapter.

* -emx, --extractMaxInputSplit <size>

Maximum HDFS input split size.

* -emn, --extractMinInputSplit <size>

Minimum HDFS input split size.

-eq, --extractQuery <query>

Query

-ofc, --outputOutputFormat <class>

Output format class.

-ifc, --inputFormatClass <class>

Input format class.

-orc, --outputReducerCount <count>

Number of reducers For output.

-cce, --centroidExtractorClass <class>

Centroid exractor class that implements org.locationtech.geowave.analytics.extract.CentroidExtractor.

-cid, --centroidIndexId <index>

Index to use for centroids.

-cfc, --centroidWrapperFactoryClass <class>

A factory class that implements org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory.

-czl, --centroidZoomLevel <level>

Zoom level for centroids.

-cct, --clusteringConverganceTolerance <tolerance>

Convergence tolerance.

* -cmi, --clusteringMaxIterations <count>

Maximum number of iterations when finding optimal clusters.

-crc, --clusteringMaxReducerCount <count>

Maximum clustering reducer count.

* -zl, --clusteringZoomLevels <count>

Number of zoom levels to process.

-dde, --commonDimensionExtractClass <class>

Dimension extractor class that implements org.locationtech.geowave.analytics.extract.DimensionExtractor.

-ens, --extractDataNamespaceUri <namespace>

Output data namespace URI.

-ede, --extractDimensionExtractClass <class>

Class to extract dimensions into a simple feature output.

-eot, --extractOutputDataTypeId <type>

Output data type ID.

-erc, --extractReducerCount <count>

Number of reducers For initial data extraction and de-duplication.

-b, --globalBatchId <id>

Batch ID.

-pb, --globalParentBatchId <id>

Parent Batch ID.

-hns, --hullDataNamespaceUri <namespace>

Data type namespace for a centroid item.

-hdt, --hullDataTypeId <type>

Data type ID for a centroid item.

-hid, --hullIndexId <index>

Index to use for centroids.

-hpe, --hullProjectionClass <class>

Class to project on to 2D space. Implements org.locationtech.geowave.analytics.tools.Projection.

-hrc, --hullReducerCount <count>

Centroid reducer count.

-hfc, --hullWrapperFactoryClass <class>

Class to create analytic item to capture hulls. Implements org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory.

* -sxs, --sampleMaxSampleSize <size>

Maximum sample size.

* -sms, --sampleMinSampleSize <size>

Minimum sample size.

* -ssi, --sampleSampleIterations <count>

Minimum number of sample iterations.

EXAMPLES

The minimum clustering iterations is 15 (-cmi), the zoom level is 1 (-zl), the maximum HDFS input split is 4000 (-emx), the minimum HDFS input split is 100 (-emn), the temporary files needed by this job are stored in hdfs:/host:port/user/rwgdrummer/temp_dir_kmeans (-hdfsbase), the HDFS IPC port is localhost:53000 (-hdfs), the Yarn job tracker is at localhost:8032 (-jobtracker), the type used is 'hail' (-query.typeNames), the minimum sample size is 4 (-sms, which is kmin), the maximum sample size is 8 (-sxs, which is kmax), the minimum number of sampling iterations is 10 (-ssi), and the data store parameters are loaded from my_store.

geowave analytic kmeansparallel -cmi 15 -zl 1 -emx 4000 -emn 100 -hdfsbase /usr/rwgdrummer/temp_dir_kmeans -hdfs localhost:53000 -jobtracker localhost:8032 --query.typeNames hail -sms 4 -sxs 8 -ssi 10 my_store

EXECUTION

K-means parallel tries to identify the optimal K (between -sms and -sxs) for a set of zoom levels (1 → -zl). When the zoom level is 1, it will perform a normal K-means and find K clusters. If the zoom level is 2 or higher, it will take each cluster found, and then try to create sub-clusters (bounded by that cluster), identifying a new optimal K for that sub-cluster. As such, without powerful infrastucture, this approach could take a significant amount of time to complete with zoom levels higher than 1.

K-means parallel executes by first executing an extraction and de-duplication on data received via GeoWaveInputFormat. The data is copied to HDFS for faster processing. The K-sampler job is used to pick sample centroid points. These centroids are then assigned a cost, and then weak centroids are stripped before the K-sampler is executed again. This process iterates several times, before the best centroid locations are found, which are fed into the real K-means algorithm as initial guesses. K-means iterates until the tolerance is reached (-cct, which defaults to 0.0001) or the max iterations is met (-cmi).

After execution, K-means parallel writes the centroids to an output data type (-eot, defaults to centroid), and then creates an informational set of convex hulls which you can plot in GeoServer to visually identify cluster groups (-hdt, defaults to convex_hull).

For tuning performance, you can set the number of reducers used in each step. Extraction/dedupe reducer count is -crc, clustering reducer count is -erc, convex Hull reducer count is -hrc, and output reducer count is -orc).

If you would like to run the algorithm multiple times, it may be useful to set the batch id (-b), which can be used to distinguish between multiple batches (runs).

K-means on Spark

NAME

geowave-analytic-kmeansspark - K-means Clustering via Spark ML

SYNOPSIS

geowave analytic kmeansspark [options] <input store name> <output store name>

DESCRIPTION

This command executes a K-means clustering analytic via Spark ML.

OPTIONS

-ct, --centroidType <type>

Feature type name for centroid output. Default is kmeans-centroids.

-ch, --computeHullData

If specified, hull count, area, and density will be computed.

--cqlFilter <filter>

An optional CQL filter applied to the input data.

-e, --epsilon <tolerance>

The convergence tolerance.

-f, --featureType <type>

Feature type name to query.

-ht, --hullType <type>

Feature type name for hull output. Default is kmeans-hulls.

-h, --hulls

If specified, convex hulls will be generated.

-ho, --host <host>

The spark driver host. Default is localhost.

-m, --master <designation>

The spark master designation. Default is yarn.

--maxSplits <count>

The maximum partitions for the input data.

--minSplits <count>

The minimum partitions for the input data.

-n, --name <name>

The Spark application name. Default is KMeans Spark.

-k, --numClusters <count>

The number of clusters to generate. Default is 8.

-i, --numIterations <count>

The number of iterations to run. Default is 20.

-t, --useTime

If specified, the time field from the input data will be used.

EXAMPLES

Perform a K-means analytic on a local spark cluster on the hail type in the my_store data store and output the results to the same data store:

geowave analytic kmeansspark -m local -f hail my_store my_store

Nearest Neighbor

NAME

geowave-analytic-nn - Nearest Neighbors

SYNOPSIS

geowave analytic nn [options] <store name>

DESCRIPTION

This command executes a Nearest Neighbors analytic. This is similar to DBScan, with less arguments. Nearest neighbor just dumps all near neighbors for every feature to a list of pairs. Most developers will want to extend the framework to add their own extensions.

OPTIONS

-conf, --mapReduceConfigFile <file>

MapReduce configuration file.

* -hdfsbase, --mapReduceHdfsBaseDir <path>

Fully qualified path to the base directory in HDFS.

* -jobtracker, --mapReduceJobtrackerHostPort <host>

[REQUIRED (or -resourceman)] Hadoop job tracker hostname and port in the format hostname:port.

* -resourceman, --mapReduceYarnResourceManager <host>

[REQUIRED (or -jobtracker)] Yarn resource manager hostname and port in the format hostname:port.

-hdfs, --mapReduceHdfsHostPort <host>

HDFS hostname and port in the format hostname:port.

--cdf, --commonDistanceFunctionClass <class>

Distance function class that implements org.locationtech.geowave.analytics.distance.DistanceFn.

* --query.typeNames <types>

The comma-separated list of types to query; by default all types are used.

--query.auth <auths>

The comma-separated list of authorizations used during extract; by default all authorizations are used.

--query.index <index>

The specific index to query; by default one is chosen for each adapter.

* -emx, --extractMaxInputSplit <size>

Maximum HDFS input split size.

* -emn, --extractMinInputSplit <size>

Minimum HDFS input split size.

-eq, --extractQuery <query>

Query

-ofc, --outputOutputFormat <class>

Output format class.

-ifc, --inputFormatClass <class>

Input format class.

-orc, --outputReducerCount <count>

Number of reducers For output.

* -oop, --outputHdfsOutputPath <path>

Output HDFS file path.

-pdt, --partitionDistanceThresholds <thresholds>

Comma separated list of distance thresholds, per dimension.

-pdu, --partitionGeometricDistanceUnit <unit>

Geometric distance unit (m=meters,km=kilometers, see symbols for javax.units.BaseUnit).

* -pmd, --partitionMaxDistance <distance>

Maximum partition distance.

-pms, --partitionMaxMemberSelection <count>

Maximum number of members selected from a partition.

-pp, --partitionPartitionPrecision <precision>

Partition precision.

-pc, --partitionPartitionerClass <class>

Perform primary partitioning for centroids with the provided class.

-psp, --partitionSecondaryPartitionerClass <class>

Perform secondary partitioning for centroids with the provided class.

EXAMPLES

The minimum HDFS input split is 2 (-emn), maximum HDFS input split is 6 (-emx), maximum search distance is 1000 meters (-pmd), the sequence file output directory is hdfs://host:port/user/rwgdrummer_out, reducer count is 4 (-orc), the HDFS IPC port is localhost:53000 (-hdfs), the Yarn job tracker is at localhost:8032 (-jobtracker), the temporary files needed by this job are stored in hdfs:/host:port//user/rwgdrummer (-hdfsbase), the input type is gpxpoint (-query.typeNames), and the data store parameters are loaded from my_store.

geowave analytic nn -emn 2 -emx 6 -pmd 1000 -oop /user/rwgdrummer_out -orc 4 -hdfs localhost:53000 -jobtracker localhost:8032 -hdfsbase /user/rwgdrummer --query.typeNames gpxpoint my_store

EXECUTION

To execute nearest neighbor search in GeoWave, we use the concept of a "partitioner" to partition all data on the hilbert curve into square segments for the purposes of parallelizing the search.

The default partitioner will multiply this value by 2 and use that for the actual partition sizes. Because of this, the terminology is a bit confusing, but the -pmd option is actually the most important variable here, describing the max distance for a point to be considered a neighbor to another point.

Spark SQL

NAME

geowave-analytic-sql - SparkSQL queries

SYNOPSIS

geowave analytic sql [options] <sql query>

DESCRIPTION

This command executes a Spark SQL query against a given data store, e.g. select * from <store name>[|<type name>] where <condition>. An alternate way of querying vector data is by using the vector query command, which does not use Spark, but provides a more robust set of querying capabilities.

OPTIONS

-n, --name <name>

The Spark application name. Default is GeoWave Spark SQL.

-ho, --host <host>

The Spark driver host. Default is localhost.

-m, --master <designation>

The Spark master designation. Default is yarn.

--csv <file>

The output CSV file name.

--out <store name>

The output data store name.

--outtype <type>

The output type to output results to.

-s, --show <count>

Number of result rows to display. Default is 20.

EXAMPLES

Select all features from the hail type in the my_store data store using a local Spark cluster:

geowave analytic sql -m local "select * from my_store|hail"

Spark Spatial Join

NAME

geowave-analytic-spatialjoin - Spatial join using Spark

SYNOPSIS

geowave analytic spatialjoin [options] <left store name> <right store name> <output store name>

DESCRIPTION

This command executes a spatial join, taking two input types and outputting features from each side that match a given predicate.

OPTIONS

-n, --name <name>

The Spark application name. Default is GeoWave Spark SQL.

-ho, --host <host>

The Spark driver host. Default is localhost.

-m, --master <designation>

The Spark master designation. Default is yarn.

-pc, --partCount <count>

The default partition count to set for Spark RDDs. Should be big enough to support the largest RDD that will be used. Sets spark.default.parallelism.

-lt, --leftTypeName <type>

Feature type name of left store to use in join.

-ol, --outLeftTypeName <type>

Feature type name of left join results.

-rt, --rightTypeName <type>

Feature type name of right store to use in join.

-or, --outRightTypeName <type>

Feature type name of right join results.

-p, --predicate <predicate>

Name of the UDF function to use when performing spatial join. Default is GeomIntersects.

-r, --radius <radius>

Used for distance join predicate and other spatial operations that require a scalar radius. Default is 0.01.

-not, --negative

Used for testing a negative result from geometry predicate. i.e GeomIntersects() == false.

EXAMPLES

Using a local Spark cluster, join all features from a hail data type in the my_store store that intersect features from a boundary type in the other_store store and output the left results to left and right types in the my_store data store.

geowave analytic spatialjoin -m local -lt hail -rt boundary -ol left -or right my_store other_store my_store

Vector Commands

Commands that operate on vector data.

Query

NAME

geowave-vector-query - Query vector data using GeoWave Query Language

SYNOPSIS

geowave vector query [options] <query>

DESCRIPTION

This command queries vector data using an SQL-like syntax. The query language currently only supports SELECT and DELETE statements.

The syntax for SELECT statements is as follows:

SELECT <attributes> FROM <storeName>.<typeName> [ WHERE CQL(<cqlFilter>) ]

Where <attributes> is a comma-separated list of column selectors or aggregation functions, <storeName> is the data store name, <typeName> is the type name, and <cqlFilter> is a CQL filter to filter the results by.

The syntax for DELETE statements is as follows:

DELETE FROM <storeName>.<typeName> [ WHERE CQL(<cqlFilter>) ]

Where <storeName> is the data store name, <typeName> is the type name, and <cqlFilter> is the filter to delete results by.

OPTIONS

--debug

If specified, print out additional info for debug purposes.

-f, --format <format>

Output format for query results. Possible values are console, csv, shp, and geojson. Both shp and geojson formats require that the query results contain at least 1 geometry column. Default is console.

When the csv format is used, additional options are:

* -o, --outputFile <file>

CSV file to output query results to.

When the shp format is used, additional options are:

* -o, --outputFile <file>

Shapefile to output query results to.

-t, --typeName <name>

Output feature type name.

When the geojson format is used, additional options are:

* -o, --outputFile <file>

GeoJson file to output query results to.

-t, --typeName <name>

Output feature type name.

EXAMPLES

Calculate the total population of countries that intersect a bounding box that covers a region of Europe:

geowave vector query "SELECT SUM(population) FROM example.countries WHERE CQL(BBOX(geom, 7, 46, 23, 51))"

Select only countries that have a population over 100 million:

geowave vector query "SELECT * FROM example.countries WHERE CQL(population>100000000)"

Output country names and populations to a CSV file:

geowave vector query -f csv -o myfile.csv "SELECT name, population FROM example.countries"

CQL Delete

NAME

geowave-vector-cqldelete - Delete data that matches a CQL filter

SYNOPSIS

geowave vector cqldelete [options] <store name>

DESCRIPTION

This command deletes all data in a data store that matches a CQL filter.

OPTIONS

--typeName <type>

The type to delete data from.

* --cql <filter>

All data that matches the CQL filter will be deleted.

--debug

If specified, print out additional info for debug purposes.

--indexName <index>

The name of the index to delete from.

EXAMPLES

Delete all data from the hail type in the example data store that lies within the given bounding box:

geowave vector cqldelete --typeName hail --cql "BBOX(geom, 7, 46, 23, 51)" example

Local Export

NAME

geowave-vector-localexport - Export vector data from a data store to Avro

SYNOPSIS

geowave vector localexport [options] <store name>

DESCRIPTION

This command exports vector data from a GeoWave data store to an Avro file.

OPTIONS

--typeNames <types>

Comma separated list of types to export.

--batchSize <size>

Records to process at a time. Default is 10,000.

--cqlFilter <filter>

Filter exported data based on CQL filter.

--indexName <index>

The name of the index to export from.

* --outputFile <file>

The file to export data to.

EXAMPLES

Export all data from the hail type in the example data store to an Avro file:

geowave vector localexport --typeNames hail  --outputFile out.avro example

MapReduce Export

NAME

geowave-vector-mrexport - Export vector data from a data store to Avro using MapReduce

SYNOPSIS

geowave vector mrexport [options] <path to base directory to write to> <store name>

DESCRIPTION

This command will perform a data export for vector data in a data store, and will use MapReduce to support high-volume data stores.

OPTIONS

--typeNames <types>

Comma separated list of types to export.

--batchSize <size>

Records to process at a time. Default is 10,000.

--cqlFilter <filter>

Filter exported data based on CQL filter.

--indexName <index>

The name of the index to export from.

--maxSplits <count>

The maximum partitions for the input data.

--minSplits <count>

The minimum partitions for the input data.

--resourceManagerHostPort <host>

The host and port of the resource manager.

EXAMPLES

Export all data from the hail type in the example data store to an Avro file using MapReduce:

geowave vector mrexport --typeNames hail --resourceManagerHostPort localhost:8032 /export example

Raster Commands

Commands that operate on raster data.

Resize with MapReduce

NAME

geowave-raster-resizemr - Resize Raster Tiles using MapReduce

SYNOPSIS

geowave raster resizemr [options] <input store name> <output store name>

DESCRIPTION

This command will resize raster tiles that are stored in a GeoWave data store using MapReduce, and write the resized tiles to a new output store.

OPTIONS

* --hdfsHostPort <host>

The HDFS host and port.

--indexName <index>

The index that the input raster is stored in.

* --inputCoverageName

The name of the input raster coverage.

* --jobSubmissionHostPort <host>

The job submission tracker host and port.

--maxSplits <count>

The maximum partitions for the input data.

--minSplits <count>

The minimum partitions for the input data.

* --outputCoverageName <name>

The output raster coverage name.

* --outputTileSize <size>

The tile size to output.

EXAMPLES

Resize the cov raster in the example data store to 256 and name the resulting raster cov_resized:

geowave raster resizemr --hdfsHostPort localhost:53000 --jobSubmissionHostPort localhost:8032 --inputCoverageName cov --outputCoverageName cov_resized --outputTileSize 256 example example

Resize with Spark

NAME

geowave-raster-resizespark - Resize Raster Tiles using Spark

SYNOPSIS

geowave raster resizespark [options] <input store name> <output store name>

DESCRIPTION

This command will resize raster tiles that are stored in a GeoWave data store using Spark, and write the resized tiles to a new output store.

OPTIONS

-ho, --host <host>

The spark driver host. Default is localhost.

--indexName <index>

The index that the input raster is stored in.

* --inputCoverageName <name>

The name of the input raster coverage.

-m, --master <designation>

The spark master designation. Default is yarn.

--maxSplits <count>

The maximum partitions for the input data.

--minSplits <count>

The minimum partitions for the input data.

-n, --name <name>

The Spark application name. Default is RasterResizeRunner.

* --outputCoverageName

The output raster coverage name.

* --outputTileSize

The tile size to output.

EXAMPLES

Resize the cov raster in the example data store to 256 and name the resulting raster cov_resized:

geowave raster resizespark -m local --inputCoverageName cov --outputCoverageName cov_resized --outputTileSize 256 example example

Install GDAL

NAME

geowave-raster-installgdal - Install GDAL by downloading native libraries

SYNOPSIS

geowave raster installgdal [options]

DESCRIPTION

This command installs the version of GDAL that is used by GeoWave. By default, it is installed to the GeoWave home directory under lib/utilities/gdal. If an alternate directory is provided, it should be added to the PATH environment variable for Mac and Windows users, or the LD_LIBRARY_PATH environment variable for Linux users.

OPTIONS

--dir

The download directory.

EXAMPLES

Install GDAL native libraries:

geowave raster installgdal

GeoServer Commands

Commands that manage GeoServer stores and layers.

Run GeoServer

NAME

geowave-gs-run - Runs a standalone GeoServer instance

SYNOPSIS

geowave gs run [options]

DESCRIPTION

This command runs a standalone GeoServer instance.

OPTIONS

-d, --directory <path>

The directory to use for GeoServer. Default is ./lib/services/third-party/embedded-geoserver/geoserver.

-i, --interactive

If specified, prompt for user input to end the process.

-p, --port <port>

Select the port for GeoServer to listen on. Default is 8080.

EXAMPLES

Run a standalone GeoServer instance:

geowave gs run

Store Commands

Add Store

NAME

geowave-gs-ds-add - Add a data store to GeoServer

SYNOPSIS

geowave gs ds add [options] <data store name>
geowave geoserver datastore add [options] <data store name>

DESCRIPTION

This command adds a GeoWave data store to GeoServer as a GeoWave store.

OPTIONS

-ds, --datastore <name>

The name of the new GeoWave store to add to GeoServer.

-ws, --workspace <workspace>

The GeoServer workspace to use for the store.

EXAMPLES

Add a GeoWave data store example as a GeoWave store in GeoServer called my_store:

geowave gs ds add -ds my_store example

Get Store

NAME

geowave-gs-ds-get - Get GeoServer store info

SYNOPSIS

geowave gs ds get [options] <store name>
geowave geoserver datastore get [options] <store name>

DESCRIPTION

This command returns information about a store within the configured GeoServer instance.

OPTIONS

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

Get information about the my_store store from GeoServer:

geowave gs ds get my_store

Get Store Adapters

NAME

geowave-gs-ds-getsa - Get type info from a GeoWave store

SYNOPSIS

geowave gs ds getsa <store name>
geowave geoserver datastore getstoreadapters <store name>

DESCRIPTION

This command returns information about all the GeoWave types in a store from the configured GeoServer instance.

EXAMPLES

Get information about all the GeoWave types in the my_store store on GeoServer:

geowave gs ds getsa my_store

List Stores

NAME

geowave-gs-ds-list - List GeoServer stores

SYNOPSIS

geowave gs ds list [options]
geowave geoserver datastore list [options]

DESCRIPTION

This command lists stores from the configured GeoServer instance.

OPTIONS

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

List all stores in GeoServer:

geowave gs ds list

Remove Store

NAME

geowave-gs-ds-rm - Remove GeoServer store

SYNOPSIS

geowave gs ds rm [options] <store name>
geowave geoserver datastore rm [options] <store name>

DESCRIPTION

This command removes a store from the configured GeoServer instance.

OPTIONS

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

Remove the my_store store from GeoServer:

geowave gs ds rm my_store

Coverage Store Commands

Add Coverage Store

NAME

geowave-gs-cs-add - Add a coverage store to GeoServer

SYNOPSIS

geowave gs cs add [options] <store name>
geowave geoserver coveragestore add [options] <store name>

DESCRIPTION

This command adds a coverage store to the configured GeoServer instance. It requires that a GeoWave store has already been added.

OPTIONS

-cs, --coverageStore <name>

The name of the coverage store to add.

-histo, --equalizeHistogramOverride

This parameter will override the behavior to always perform histogram equalization if a histogram exists.

-interp, --interpolationOverride <value>

This will override the default interpolation stored for each layer. Valid values are 0, 1, 2, 3 for NearestNeighbor, Bilinear, Bicubic, and Bicubic (polynomial variant) respectively.

-scale, --scaleTo8Bit

By default, integer values will automatically be scaled to 8-bit and floating point values will not. This can be overridden setting this option.

-ws, --workspace <workspace>

The GeoServer workspace to add the coverage store to.

EXAMPLES

Add a coverage store called cov_store to GeoServer using the my_store GeoWave store:

geowave gs cs add -cs cov_store my_store

Get Coverage Store

NAME

geowave-gs-cs-get - Get GeoServer coverage store info

SYNOPSIS

geowave gs cs get [options] <coverage store name>
geowave geoserver coveragestore get [options] <coverage store name>

DESCRIPTION

This command will return information about a coverage store from the configured GeoServer instance.

OPTIONS

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

Get information about the coverage store called my_store from GeoServer:

geowave gs cs get my_store

List Coverage Stores

NAME

geowave-gs-cs-list - List GeoServer coverage stores

SYNOPSIS

geowave gs cs list [options]
geowave geoserver coveragestore list [options]

DESCRIPTION

This command lists all coverage stores in the configured GeoServer instance.

OPTIONS

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

List all coverage stores in GeoServer:

geowave gs cs list

Remove Coverage Store

NAME

geowave-gs-cs-rm - Remove GeoServer Coverage Store

SYNOPSIS

geowave gs cs rm [options] <coverage store name>
geowave geoserver coveragestore rm [options] <coverage store name>

DESCRIPTION

This command removes a coverage store from the configured GeoServer instance.

OPTIONS

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

Remove the cov_store coverage store from GeoServer:

geowave gs cs rm cov_store

Coverage Commands

Add Coverage

NAME

geowave-gs-cv-add - Add a coverage to GeoServer

SYNOPSIS

geowave gs cv add [options] <coverage name>
geowave geoserver coverage add [options] <coverage name>

DESCRIPTION

This command adds a coverage to the configured GeoServer instance.

OPTIONS

* -cs, --cvgstore <name>

Coverage store name.

-ws, --workspace <workspace>

GeoServer workspace to add the coverage to.

EXAMPLES

Add a coverage called cov to the cov_store coverage store on the configured GeoServer instance:

geowave gs cv add -cs cov_store cov

Get Coverage

NAME

geowave-gs-cv-get - Get a GeoServer coverage’s info

SYNOPSIS

geowave gs cv get [options] <coverage name>
geowave geoserver coverage get [options] <coverage name>

DESCRIPTION

This command returns a information about a coverage from the configured GeoServer instance.

OPTIONS

-cs, --coverageStore <name>

The name of the GeoServer coverage store.

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

Get information about the cov coverage in the cov_store coverage store:

geowave gs cv get -cs cov_store cov

List Coverages

NAME

geowave-gs-cv-list - List GeoServer coverages

SYNOPSIS

geowave gs cv list [options] <coverage store name>
geowave geoserver coverage list [options] <coverage store name>

DESCRIPTION

This command lists all coverages from a given coverage store in the configured GeoServer instance.

OPTIONS

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

List all coverages in the cov_store coverage store on GeoServer:

geowave gs cv list cov_store

Remove Coverage

NAME

geowave-gs-cv-rm - Remove a GeoServer coverage

SYNOPSIS

geowave gs cv rm [options] <coverage name>
geowave geoserver coverage rm [options] <coverage name>

DESCRIPTION

This command removes a coverage from the configured GeoServer instance.

OPTIONS

* -cs, --cvgstore <name>

The coverage store that contains the coverage.

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

Remove the cov coverage from the cov_store coverage store in GeoServer:

geowave gs cv rm -cs cov_store cov

Layer Commands

Add GeoWave Layer

NAME

geowave-gs-layer-add - Add a GeoServer layer from the given GeoWave data store

SYNOPSIS

geowave gs layer add [options] <data store name>
geowave geoserver layer add [options] <data store name>

DESCRIPTION

This command adds a layer from the given GeoWave data store to the configured GeoServer instance. Unlike gs fl add, this command adds a layer directly from a GeoWave data store, automatically creating the GeoWave store for it in GeoServer.

OPTIONS

-t, --typeName <type>

Add the type with the given name to GeoServer.

-a, --add <layer type>

Add all layers of the given type to GeoServer. Possible values are ALL, RASTER, and VECTOR.

-sld, --setStyle <style>

The default style to use for the added layers.

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

Add a type called hail from the example data store to GeoServer:

geowave gs layer add -t hail example

Add all types from the example data store to GeoServer:

geowave gs layer add --add ALL example

Add all vector types from the example data store to GeoServer:

geowave gs layer add --add VECTOR example

Add Feature Layer

NAME

geowave-gs-fl-add - Add a feature layer to GeoServer

SYNOPSIS

geowave gs fl add [options] <layer name>
geowave geoserver featurelayer add [options] <layer name>

DESCRIPTION

This command adds a feature layer from a GeoWave store to the configured GeoServer instance.

OPTIONS

* -ds, --datastore <name>

The GeoWave store (on GeoServer) to add the layer from.

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

Add a layer called hail from the my_store GeoWave store:

geowave gs fl add -ds my_store hail

Get Feature Layer

NAME

geowave-gs-fl-get - Get GeoServer feature layer info

SYNOPSIS

geowave gs fl get <layer name>
geowave geoserver featurelayer get <layer name>

DESCRIPTION

This command returns information about a layer in the configured GeoServer instance.

EXAMPLES

Get information about the layer hail from GeoServer:

geowave gs fl get hail

List Feature Layers

NAME

geowave-gs-fl-list - List GeoServer feature layers

SYNOPSIS

geowave gs fl list [options]
geowave geoserver featurelayer list [options]

DESCRIPTION

This command lists feature layers from the configured GeoServer instance.

OPTIONS

-ds, --datastore <name>

The GeoServer store name to list feature layers from.

-g, --geowaveOnly

If specified, only layers from GeoWave stores will be listed.

-ws, --workspace <workspace>

The GeoServer workspace to use.

EXAMPLES

List all feature layers in GeoServer:

geowave gs fl list

List all GeoWave feature layers in GeoServer:

geowave gs fl list -g

List all feature layers from the my_store store in GeoServer:

geowave gs fl list -ds my_store

Remove Feature Layer

NAME

geowave-gs-fl-rm - Remove GeoServer feature Layer

SYNOPSIS

geowave gs fl rm <layer name>
geowave geoserver featurelayer rm <layer name>

DESCRIPTION

This command removes a feature layer from the configured GeoServer instance.

EXAMPLES

Remove the hail layer from GeoServer:

geowave gs fl rm hail

Style Commands

Add Style

NAME

geowave-gs-style-add - Add a style to GeoServer

SYNOPSIS

geowave gs style add [options] <style name>
geowave geoserver style add [options] <style name>

DESCRIPTION

This command adds an SLD style file to the configured GeoServer instance.

OPTIONS

* -sld, --stylesld <file>

The SLD to add to GeoServer.

EXAMPLES

Add the my_sld.sld style file to GeoServer with the name my_style:

geowave gs style add -sld my_sld.sld my_style

Get Style

NAME

geowave-gs-style-get - Get GeoServer style info

SYNOPSIS

geowave gs style get <style name>
geowave geoserver style get <style name>

DESCRIPTION

This command returns information about a style from the configured GeoServer instance.

EXAMPLES

Get information about the my_style style on GeoServer:

geowave gs style get my_style

List Styles

NAME

geowave-gs-style-list - List GeoServer styles

SYNOPSIS

geowave gs style list
geowave geoserver style list

DESCRIPTION

This command lists all styles in the configured GeoServer instance.

EXAMPLES

List all styles in GeoServer:

geowave gs style list

Remove Style

NAME

geowave-gs-style-rm - Remove GeoServer Style

SYNOPSIS

geowave gs style rm <style name>
geowave geoserver style rm <style name>

DESCRIPTION

This command removes a style from the configured GeoServer instance.

EXAMPLES

Remove the my_style style from GeoServer:

geowave gs style rm my_style

Set Layer Style

NAME

geowave-gs-style-set - Set GeoServer layer style

SYNOPSIS

geowave gs style set [options] <layer name>
geowave geoserver style set [options] <layer name>

DESCRIPTION

This command sets the layer style to the specified style in the configured GeoServer instance.

OPTIONS

* -sn, --styleName <name>

The name of the style to set on the layer.

EXAMPLES

Set the style on the hail layer to my_style:

geowave gs style set -sn my_style hail

Workspace Commands

Add Workspace

NAME

geowave-gs-ws-add - Add a workspace to GeoServer

SYNOPSIS

geowave gs ws add <workspace name>
geowave geoserver workspace add <workspace name>

DESCRIPTION

This command adds a new workspace to the configured GeoServer instance.

EXAMPLES

Add a new workspace to GeoServer called geowave:

geowave gs ws add geowave

List Workspaces

NAME

geowave-gs-ws-list - List GeoServer workspaces

SYNOPSIS

geowave gs ws list
geowave geoserver workspace list

DESCRIPTION

This command lists all workspaces in the configured GeoServer instance.

EXAMPLES

List all workspaces in GeoServer:

geowave gs ws list

Remove Workspace

NAME

geowave-gs-ws-rm - Remove GeoServer workspace

SYNOPSIS

geowave gs ws rm <workspace name>
geowave geoserver workspace rm <workspace name>

DESCRIPTION

This command removes a workspace from the configured GeoServer instance.

EXAMPLES

Remove the geowave workspace from GeoServer:

geowave gs ws rm geowave

Utility Commands

Miscellaneous operations that don’t really warrant their own top-level command. This includes commands to start standalone data stores and services.

Standalone Store Commands

Commands that stand up standalone stores for testing and debug purposes.

Run Standalone Accumulo

NAME

geowave-util-accumulo-run - Runs a standalone mini Accumulo server for test and debug with GeoWave

SYNOPSIS

geowave util accumulo run

DESCRIPTION

This command runs a standalone mini single-node Accumulo server, which can be used locally for testing and debugging GeoWave, without needing to stand up an entire cluster.

EXAMPLES

Run a standalone Accumulo cluster:

geowave util accumulo run

Run Standalone Bigtable

NAME

geowave-util-bigtable-run - Runs a standalone Bigtable instance for test and debug with GeoWave

SYNOPSIS

geowave util bigtable run [options]

DESCRIPTION

This command runs a standalone Bigtable instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.

OPTIONS

-d, --directory <path>

The directory to use for Bigtable. Default is ./target/temp.

-i, --interactive <enabled>

Whether to prompt for user input to end the process. Default is true.

-p, --port <host>

The host and port the emulator will run on. Default is 127.0.0.1:8086.

-s, --sdk <sdk>

The name of the Bigtable SDK. Default is google-cloud-sdk-183.0.0-linux-x86_64.tar.gz.

-u, --url <url>

The URL location to download Bigtable. Default is https://dl.google.com/dl/cloudsdk/channels/rapid/downloads.

EXAMPLES

Run a standalone Bigtable instance:

geowave util bigtable run -d .

Run Standalone Cassandra

NAME

geowave-util-cassandra-run - Runs a standalone Cassandra instance for test and debug with GeoWave

SYNOPSIS

geowave util cassandra run [options]

DESCRIPTION

This command runs a standalone Cassandra instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.

OPTIONS

-c, --clusterSize <size>

The number of individual Cassandra processes to run. Default is 1.

-d, --directory <path>

The directory to use for Cassandra.

-i, --interactive <enabled>

Whether to prompt for user input to end the process. Default is true.

-m, --maxMemoryMB <size>

The maximum memory to use in MB. Default is 512.

EXAMPLES

Run a standalone Cassandra instance:

geowave util cassandra run -d .

Run Standalone DynamoDB

NAME

geowave-util-dynamodb-run - Runs a standalone DynamoDB instance for test and debug with GeoWave

SYNOPSIS

geowave util dynamodb run [options]

DESCRIPTION

This command runs a standalone DynamoDB instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.

OPTIONS

-d, --directory <path>

The directory to use for DynamoDB.

-i, --interactive <enabled>

Whether to prompt for user input to end the process. Default is true.

EXAMPLES

Run a standalone DynamoDB instance:

geowave util dynamodb run -d .

Run Standalone HBase

NAME

geowave-util-hbase-run - Runs a standalone HBase instance for test and debug with GeoWave

SYNOPSIS

geowave util hbase run [options]

DESCRIPTION

This command runs a standalone HBase instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.

OPTIONS

-a, --auth <authorizations>

A list of authorizations to grant the admin user.

-d, --dataDir <path>

Directory for HBase server-side data. Default is ./lib/services/third-party/embedded-hbase/data.

-i, --interactive

If specified, prompt for user input to end the process.

-l, --libDir <path>

Directory for HBase server-side libraries. Default is ./lib/services/third-party/embedded-hbase/lib.

-r, --regionServers <count>

The number of region server processes. Default is 1.

-z, --zkDataDir <path>

The data directory for the Zookeper instance. Default is ./lib/services/third-party/embedded-hbase/zookeeper.

EXAMPLES

Run a standalone HBase instance:

geowave util hbase run

Run Standalone Kudu

NAME

geowave-util-kudu-run - Runs a standalone Kudu instance for test and debug with GeoWave

SYNOPSIS

geowave util kudu run [options]

DESCRIPTION

This command runs a standalone Kudu instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.

OPTIONS

-d, --directory <path>

The directory to use for Kudu. Default is ./target/temp.

-i, --interactive <enabled>

Whether to prompt for user input to end the process. Default is true.

-t, --tablets <count>

The number of tablets to use for Kudu. Default is 0.

EXAMPLES

Run a standalone Kudu instance:

geowave util kudu run -d . -t 2

Run Standalone Redis

NAME

geowave-util-redis-run - Runs a standalone Redis instance for test and debug with GeoWave

SYNOPSIS

geowave util redis run [options]

DESCRIPTION

This command runs a standalone Redis instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.

OPTIONS

-d, --directory <path>

The directory to use for Redis. If set, the data will be persisted and durable. If none, it will use a temp directory and delete when complete

-i, --interactive <enabled>

Whether to prompt for user input to end the process. Default is true.

-m, --maxMemory <size>

The maximum memory to use (in a form such as 512M or 1G). Default is 1G.

-p, --port <port>

The port for Redis to listen on. Default is 6379.

-s, --setting <setting>

A setting to apply to Redis in the form of <name>=<value>.

EXAMPLES

Run a standalone Redis instance:

  geowave util redis run
'''

Accumulo Commands

Utility operations to set Accumulo splits and run a test server.

Run Standalone

NAME

geowave-util-accumulo-run - Runs a standalone mini Accumulo server for test and debug with GeoWave

SYNOPSIS

geowave util accumulo run

DESCRIPTION

This command runs a standalone mini single-node Accumulo server, which can be used locally for testing and debugging GeoWave, without needing to stand up an entire cluster.

EXAMPLES

Run a standalone Accumulo cluster:

geowave util accumulo run

Pre-split Partition IDs

NAME

geowave-util-accumulo-presplitpartitionid - Pre-split Accumulo table by providing the number of partition IDs

SYNOPSIS

geowave util accumulo presplitpartitionid [options] <store name>

DESCRIPTION

This command pre-splits an Accumulo table by providing the number of partition IDs.

OPTIONS

--indexName <name>

The GeoWave index. Default is all indices.

--num <count>

The number of partitions.

EXAMPLES

Pre-split the spatial_idx table to 8 partitions in the example data store:

geowave util accumulo presplitpartitionid --indexName spatial_idx --num 8 example

Split Equal Interval

NAME

geowave-util-accumulo-splitequalinterval - Set Accumulo splits by providing the number of partitions based on an equal interval strategy

SYNOPSIS

geowave util accumulo splitequalinterval [options] <store name>

DESCRIPTION

This command will allow a user to set the accumulated splits through providing the number of partitions based on an equal interval strategy.

OPTIONS

--indexName <name>

The GeoWave index. Default is all indices.

--num <count>

The number of partitions.

EXAMPLES

Split the spatial_idx table to 8 partitions using an equal interval strategy in the example data store:

geowave util accumulo splitequalinterval --indexName spatial_idx --num 8 example

Split by Number of Records

NAME

geowave-util-accumulo-splitnumrecords - Set Accumulo splits by providing the number of entries per split

SYNOPSIS

geowave util accumulo splitnumrecords [options] <storename>

DESCRIPTION

This command sets the Accumulo data store splits by providing the number of entries per split.

OPTIONS

--indexName <name>

The GeoWave index. Default is all indices.

--num <count>

The number of entries.

EXAMPLES

Set the number of entries per split to 1000 on the spatial_idx index of the example data store:

geowave util accumulo splitnumrecords --indexName spatial_idx --num 1000 example

Split Quantile Distribution

NAME

geowave-util-accumulo-splitquantile - Set Accumulo splits by providing the number of partitions based on a quantile distribution strategy

SYNOPSIS

geowave util accumulo splitquantile [options] <store name>

DESCRIPTION

This command allows a user to set the Accumulo data store splits by providing the number of partitions based on a quantile distribution strategy.

OPTIONS

--indexName <name>

The GeoWave index. Default is all indices.

--num <count>

The number of partitions.

EXAMPLES

Split the spatial_idx table to 8 partitions using a quantile distribution strategy in the example data store:

geowave util accumulo splitquantile --indexName spatial_idx --num 8 example

OSM Commands

Operations to ingest Open Street Map (OSM) nodes, ways and relations to GeoWave.

OSM commands are not included in GeoWave by default.

Import OSM

NAME

geowave-util-osm-ingest - Ingest and convert OSM data from HDFS to GeoWave

SYNOPSIS

geowave util osm ingest [options] <hdfs host:port> <path to base directory to read from> <store name>

DESCRIPTION

This command will ingest and convert OSM data from HDFS to GeoWave.

OPTIONS

-jn, --jobName

Name of mapreduce job. Default is Ingest (mcarrier).

-m, --mappingFile

Mapping file, imposm3 form.

--table

OSM Table name in GeoWave. Default is OSM.

* -t, --type

Mapper type - one of node, way, or relation.

-v, --visibility

The visibility of the data ingested (optional; default is 'public').

Stage OSM

NAME

geowave-util-osm-stage - Stage OSM data to HDFS

SYNOPSIS

geowave util osm stage [options] <file or directory> <hdfs host:port> <path to base directory to write to>

DESCRIPTION

This command will stage OSM data from a local directory and write it to HDFS.

OPTIONS

--extension

PBF File extension. Default is .pbf.


Python Commands

Commands for use with the GeoWave Python bindings.

Run Py4J Java Gateway

NAME

geowave-util-python-rungateway - Run a Py4J java gateway

SYNOPSIS

geowave util python rungateway

DESCRIPTION

This command starts the Py4J java gateway required by pygw.

EXAMPLES

Run the Py4J java gateway:

geowave util python rungateway

Landsat8 Commands

Operations to analyze, download, and ingest Landsat 8 imagery publicly available on AWS.

Analyze Landsat 8

NAME

geowave-util-landsat-analyze - Print out basic aggregate statistics for available Landsat 8 imagery

SYNOPSIS

geowave util landsat analyze [options]

DESCRIPTION

This command prints out basic aggregate statistics that are for available Landsat 8 imagery.

OPTIONS

--cql <filter>

An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double), processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally attributes of the individuals band can be used such as band (String), sizeMB (double), and bandDownloadUrl (String).

--nbestbands <count>

An option to identify and only use a set number of bands with the best cloud cover.

--nbestperspatial

A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by path/row.

--nbestscenes <count>

An option to identify and only use a set number of scenes with the best cloud cover.

--sincelastrun

If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.

--usecachedscenes

If specified, run against the existing scenes catalog in the workspace directory if it exists.

-ws, --workspaceDir <path>

A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.

EXAMPLES

Analyze the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin, Germany:

geowave util landsat analyze --nbestperspatial --nbestscenes 1 --cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" -ws ./landsat

Download Landsat 8

NAME

geowave-util-landsat-download - Download Landsat 8 imagery to a local directory

SYNOPSIS

geowave util landsat download [options]

DESCRIPTION

This command downloads Landsat 8 imagery to a local directory.

OPTIONS

--cql <filter>

An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double), processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally attributes of the individuals band can be used such as band (String), sizeMB (double), and bandDownloadUrl (String).

--nbestbands <count>

An option to identify and only use a set number of bands with the best cloud cover.

--nbestperspatial

A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by path/row.

--nbestscenes <count>

An option to identify and only use a set number of scenes with the best cloud cover.

--sincelastrun

If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.

--usecachedscenes

If specified, run against the existing scenes catalog in the workspace directory if it exists.

-ws, --workspaceDir <path>

A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.

EXAMPLES

Download the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin, Germany:

geowave util landsat download --nbestperspatial --nbestscenes 1 --cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" -ws ./landsat

Ingest Landsat 8

NAME

geowave-util-landsat-ingest - Ingest Landsat 8 imagery and metadata into a GeoWave data store

SYNOPSIS

geowave util landsat ingest [options] <store name> <comma delimited index list>

DESCRIPTION

This command downloads Landsat 8 imagery and then ingests it as raster data into GeoWave. At the same time, it ingests the scene metadata as vector data. The raster and vector data can be ingested into two separate data stores, if desired.

OPTIONS

--converter <converter>

Prior to ingesting an image, this converter will be used to massage the data. The default is not to convert the data.

--coverage <name>

The name to give to each unique coverage. Freemarker templating can be used for variable substition based on the same attributes used for filtering. The default coverage name is ${entityId}_${band}. If ${band} is unused in the coverage name, all bands will be merged together into the same coverage.

--cql <filter>

An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double), processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally attributes of the individuals band can be used such as band (String), sizeMB (double), and bandDownloadUrl (String).

--crop

If specified, use the spatial constraint provided in CQL to crop the image. If no spatial constraint is provided, this will not have an effect.

--histogram

If specified, store the histogram of the values of the coverage so that histogram equalization will be performed.

--nbestbands <count>

An option to identify and only use a set number of bands with the best cloud cover.

--nbestperspatial

A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by path/row.

--nbestscenes <count>

An option to identify and only use a set number of scenes with the best cloud cover.

--overwrite

If specified, overwrite images that are ingested in the local workspace directory. By default it will keep an existing image rather than downloading it again.

--pyramid

If specified, store an image pyramid for the coverage.

--retainimages

If specified, keep the images that are ingested in the local workspace directory. By default it will delete the local file after it is ingested successfully.

--sincelastrun

If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.

--skipMerge

By default the ingest will automerge overlapping tiles as a post-processing optimization step for efficient retrieval, but this option will skip the merge process.

--subsample <factor>

Subsample the image prior to ingest by the scale factor provided. The scale factor should be an integer value greater than or equal to 1. Default is 1.

--tilesize <size>

The pixel size for each tile stored in GeoWave. Default is 256.

--usecachedscenes

If specified, run against the existing scenes catalog in the workspace directory if it exists.

--vectorindex <index>

By ingesting as both vectors and rasters you may want each indexed differently. This will override the index used for vector output.

--vectorstore <store name>

By ingesting as both vectors and rasters you may want to ingest vector data into a different data store. This will override the data store for vector output.

-ws, --workspaceDir <path>

A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.

EXAMPLES

Ingest and crop the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin, Germany, and output raster data to a landsatraster data store and vector data to a landsatvector data store:

geowave util landsat ingest --nbestperspatial --nbestscenes 1 --usecachedscenes --cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" --crop --retainimages -ws ./landsat --vectorstore landsatvector --pyramid --coverage berlin_mosaic landsatraster spatial-idx

Ingest Landsat 8 Raster

NAME

geowave-util-landsat-ingestraster - Ingest Landsat 8 imagery into a GeoWave data store

SYNOPSIS

geowave util landsat ingestraster [options] <store name> <comma delimited index list>

DESCRIPTION

This command downloads Landsat 8 imagery and then ingests it as raster data into GeoWave.

OPTIONS

--converter <converter>

Prior to ingesting an image, this converter will be used to massage the data. The default is not to convert the data.

--coverage <name>

The name to give to each unique coverage. Freemarker templating can be used for variable substition based on the same attributes used for filtering. The default coverage name is ${entityId}_${band}. If ${band} is unused in the coverage name, all bands will be merged together into the same coverage.

--cql <filter>

An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double), processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally attributes of the individuals band can be used such as band (String), sizeMB (double), and bandDownloadUrl (String).

--crop

If specified, use the spatial constraint provided in CQL to crop the image. If no spatial constraint is provided, this will not have an effect.

--histogram

If specified, store the histogram of the values of the coverage so that histogram equalization will be performed.

--nbestbands <count>

An option to identify and only use a set number of bands with the best cloud cover.

--nbestperspatial

A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by path/row.

--nbestscenes <count>

An option to identify and only use a set number of scenes with the best cloud cover.

--overwrite

If specified, overwrite images that are ingested in the local workspace directory. By default it will keep an existing image rather than downloading it again.

--pyramid

If specified, store an image pyramid for the coverage.

--retainimages

If specified, keep the images that are ingested in the local workspace directory. By default it will delete the local file after it is ingested successfully.

--sincelastrun

If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.

--skipMerge

By default the ingest will automerge overlapping tiles as a post-processing optimization step for efficient retrieval, but this option will skip the merge process.

--subsample <factor>

Subsample the image prior to ingest by the scale factor provided. The scale factor should be an integer value greater than or equal to 1. Default is 1.

--tilesize <size>

The pixel size for each tile stored in GeoWave. Default is 512.

--usecachedscenes

If specified, run against the existing scenes catalog in the workspace directory if it exists.

-ws, --workspaceDir <path>

A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.

EXAMPLES

Ingest and crop the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin, Germany, and output raster data to a landsatraster data store:

geowave util landsat ingestraster --nbestperspatial --nbestscenes 1 --usecachedscenes --cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" --crop --retainimages -ws ./landsat --pyramid --coverage berlin_mosaic landsatraster spatial-idx

Ingest Landsat 8 Metadata

NAME

geowave-util-landsat-ingestvector - Ingest Landsat 8 scene and band metadata into a data store

SYNOPSIS

geowave util landsat ingestvector [options] <store name> <comma delimited index list>

DESCRIPTION

This command ingests Landsat 8 scene and band metadata into a GeoWave data store.

OPTIONS

--cql <filter>

An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes: shape (Geometry), acquisitionDate (Date), cloudCover (double), processingLevel (String), path (int), row (int) and the feature ID is entityId for the scene. Additionally attributes of the individuals band can be used such as band (String), sizeMB (double), and bandDownloadUrl (String).

--nbestbands <count>

An option to identify and only use a set number of bands with the best cloud cover.

--nbestperspatial

A flag that when applied with --nbestscenes or --nbestbands will aggregate scenes and/or bands by path/row.

--nbestscenes <count>

An option to identify and only use a set number of scenes with the best cloud cover.

--sincelastrun

If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.

--usecachedscenes

If specified, run against the existing scenes catalog in the workspace directory if it exists.

-ws, --workspaceDir <path>

A local directory to write temporary files needed for landsat 8 ingest. Default is landsat8.

EXAMPLES

Ingest scene and band metadata of the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin, Germany to a landsatvector data store:

geowave util landsat ingestvector --nbestperspatial --nbestscenes 1 --usecachedscenes --cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" -ws ./landsat landsatvector spatial-idx

gRPC Commands

Commands for working with the gRPC service.

Start gRPC Server

NAME

geowave-util-grpc-start - Start the GeoWave gRPC server

SYNOPSIS

geowave util grpc start [options]

DESCRIPTION

This command starts the GeoWave gRPC server on a given port number. Remote gRPC clients can interact with GeoWave from this service.

OPTIONS

-p, --port <port>

The port number the server should run on. Default is 8980.

-n, --nonBlocking

If specified, runs the server in non-blocking mode.

EXAMPLE

Run a gRPC server on port 8980:

geowave util grpc start -p 8980

Stop gRPC Server

NAME

geowave-util-grpc-stop - Stop the GeoWave gRPC server

SYNOPSIS

geowave util grpc stop

DESCRIPTION

Shuts down the GeoWave gRPC server.

EXAMPLES

Shut down the gRPC server:

geowave util grpc stop

Python Commands

FileSystem datastore commands

List Available FileSystem Data Formats

NAME

geowave-util-filesystem-listformats - List available filesystem data formats

SYNOPSIS

geowave util filesystem listformats

DESCRIPTION

List available formats for usage with --format option with FileSystem datastore

EXAMPLES

List available filesystem data formats:

geowave util filesystem listformats