GeoWave Command-Line Interface
Overview
The Command-Line Interface provides a way to execute a multitude of common operations on GeoWave data stores without having to use the Programmatic API. It allows users to manage data stores, indices, statistics, and more. All command options that are marked with *
are required for the command to execute.
Configuration
The CLI uses a local configuration file to store sets of data store connection parameters aliased by a store name. Most GeoWave commands ask for a store name and use the configuration file to determine which connection parameters should be used. It also stores connection information for GeoServer, AWS, and HDFS for commands that use those services. This configuration file is generally stored in the user’s home directory, although an alternate configuration file can be specified when running commands.
General Usage
The root of all GeoWave CLI commands is the base geowave
command.
$ geowave
This will display a list of all available top-level commands along with a brief description of each.
Version
$ geowave --version
The --version
flag will display various information about the installed version of GeoWave, including the version, build arguments, and revision information.
General Flags
These flags can be optionally supplied to any GeoWave command, and should be supplied before the command itself.
Config File
The --config-file
flag causes GeoWave to use an alternate configuration file. The supplied file path should include the file name (e.g. --config-file /mnt/config.properties
). This can be useful if you have multiple projects that use GeoWave and want to keep the configuration for those data stores separate from each other.
$ geowave --config-file <path_to_file> <command>
Help Command
Adding help
before any CLI command will show that command’s options and their defaults.
$ geowave help <command>
For example, using the help
command on index add
would result in the following output:
$ geowave help index add Usage: geowave index add [options] <store name> <index name> Options: -np, --numPartitions The number of partitions. Default partitions will be 1. Default: 1 -ps, --partitionStrategy The partition strategy to use. Default will be none. Default: NONE Possible Values: [NONE, HASH, ROUND_ROBIN] * -t, --type The type of index, such as spatial, or spatial_temporal
Explain Command
The explain
command is similar to the help
command in it’s usage, but shows all options, including hidden ones. It can be a great way to make sure your parameters are correct before issuing a command.
$ geowave explain <command>
For example, if you wanted to add a spatial index to a store named test-store
but weren’t sure what all of the options available to you were, you could do the following:
$ geowave explain index add -t spatial test-store spatial-idx Command: geowave [options] <subcommand> ... VALUE NEEDED PARAMETER NAMES ---------------------------------------------- { } -cf, --config-file, { } --debug, { } --version, Command: add [options] VALUE NEEDED PARAMETER NAMES ---------------------------------------------- { EPSG:4326} -c, --crs, { false} -fp, --fullGeometryPrecision, { 7} -gp, --geometryPrecision, { 1} -np, --numPartitions, { NONE} -ps, --partitionStrategy, { false} --storeTime, { spatial} -t, --type, Expects: <store name> <index name> Specified: test-store spatial-idx
The output is broken down into two sections. The first section shows all of the options available on the geowave
command. If you wanted to use any of these options, they would need to be specified before index add
. The second section shows all of the options available on the index add
command. Some commands contain options that, when specified, may reveal more options. In this case, the -t spatial
option has revealed some additional configuration options that we could apply to the spatial index. Another command where this is useful is the store add
command, where each data store type specified by the -t <store_type>
option has a different set of configuration options.
Config Commands
Commands that affect the local GeoWave configuration.
Configure AWS
Configure GeoServer
DESCRIPTION
This command creates a local configuration for connecting to GeoServer which is used by geoserver
or gs
commands.
OPTIONS
- -p, --password <password>
-
GeoServer Password - Can be specified as 'pass:<password>', 'file:<local file containing the password>', 'propfile:<local properties file containing the password>:<property file key>', 'env:<variable containing the pass>', or stdin
- -u, --username <username>
-
GeoServer User
- -ws, --workspace <workspace>
-
GeoServer Default Workspace
SSL CONFIGURATION OPTIONS
- --sslKeyManagerAlgorithm <algorithm>
-
Specify the algorithm to use for the keystore.
- --sslKeyManagerProvider <provider>
-
Specify the key manager factory provider.
- --sslKeyPassword <password>
-
Specify the password to be used to access the server certificate from the specified keystore file. Can be specified as
pass:<password>
,file:<local file containing the password>
,propfile:<local properties file containing the password>:<property file key>
,env:<variable containing the pass>
, orstdin
. - --sslKeyStorePassword <password>
-
Specify the password to use to access the keystore file. Can be specified as
pass:<password>
,file:<local file containing the password>
,propfile:<local properties file containing the password>:<property file key>
,env:<variable containing the pass>
, orstdin
. - --sslKeyStorePath <path>
-
Specify the absolute path to where the keystore file is located on system. The keystore contains the server certificate to be loaded.
- --sslKeyStoreProvider <provider>
-
Specify the name of the keystore provider to be used for the server certificate.
- --sslKeyStoreType <type>
-
The type of keystore file to be used for the server certificate.
- --sslSecurityProtocol <protocol>
-
Specify the Transport Layer Security (TLS) protocol to use when connecting to the server. By default, the system will use TLS.
- --sslTrustManagerAlgorithm <algorithm>
-
Specify the algorithm to use for the truststore.
- --sslTrustManagerProvider <provider>
-
Specify the trust manager factory provider.
- --sslTrustStorePassword <password>
-
Specify the password to use to access the truststore file. Can be specified as
pass:<password>
,file:<local file containing the password>
,propfile:<local properties file containing the password>:<property file key>
,env:<variable containing the pass>
, orstdin
. - --sslTrustStorePath <path>
-
Specify the absolute path to where truststore file is located on system. The truststore file is used to validate client certificates.
- --sslTrustStoreProvider <provider>
-
Specify the name of the truststore provider to be used for the server certificate.
- --sslTrustStoreType <type>
-
Specify the type of key store used for the truststore, i.e. JKS (Java KeyStore).
EXAMPLES
Configure GeoWave to use locally running GeoServer:
geowave config geoserver "http://localhost:8080/geoserver"
Configure GeoWave to use GeoServer running on another host:
geowave config geoserver "${HOSTNAME}:8080"
Configure GeoWave to use a particular workspace on a GeoServer instance:
geowave config geoserver -ws myWorkspace "http://localhost:8080/geoserver"
Configure HDFS
List Configured Properties
Configure Cryptography Key
NAME
geowave-config-newcryptokey - generate a new security cryptography key for use with configuration properties
Store Commands
Commands for managing GeoWave data stores.
Add Store
DESCRIPTION
This command adds a new store to the GeoWave configuration. The store name can then be used by other commands for interfacing with the configured data store.
OPTIONS
- -d, --default
-
Make this the default store in all operations
- *-t, --type <arg>
-
The type of store. A list of available store types can be found using the
store listplugins
command.
All core data stores have these options:
- --gwNamespace <namespace>
-
The GeoWave namespace. By default, no namespace is used.
- --enableServerSideLibrary <enabled>
-
Enable server-side operations if possible. Default is
true
. - --enableSecondaryIndexing
-
If specified, secondary indexing will be used.
- --enableVisibility <enabled>
-
If specified, visibility will be explicitly enabled or disabled. Default is unspecified.
- --maxRangeDecomposition <count>
-
The maximum number of ranges to use when breaking down queries.
- --aggregationMaxRangeDecomposition <count>
-
The maximum number of ranges to use when breaking down aggregation queries.
When the accumulo
type option is used, additional options are:
- * -i, --instance <instance>
-
The Accumulo instance ID.
- -u, --user <user>
-
A valid Accumulo user ID. If not given and using SASL, the active Kerberos user will be used.
- -k, --keytab <keytab>
-
Path to keytab file for Kerberos authentication. If using SASL, this is required.
- --sasl <sasl>
-
Use SASL to connect to Accumulo (Kerberos).
- -p, --password <password>
-
The password for the user. Can be specified as
pass:<password>
,file:<local file containing the password>
,propfile:<local properties file containing the password>:<property file key>
,env:<variable containing the pass>
, orstdin
. - *-z, --zookeeper <servers>
-
A comma-separated list of Zookeeper servers that an Accumulo instance is using.
When the hbase
type option is used, additional options are:
- * -z, --zookeeper <servers>
-
A comma-separated list of zookeeper servers that an HBase instance is using.
- --coprocessorJar <path>
-
Path (HDFS URL) to the JAR containing coprocessor classes.
- --disableVerifyCoprocessors
-
If specified, disable coprocessor verification, which ensures that coprocessors have been added to the HBase table prior to executing server-side operations.
- --scanCacheSize <size>
-
The number of rows passed to each scanner (higher values will enable faster scanners, but will use more memory).
When the redis
type option is used, additional options are:
- * -a, --address <address>
-
The address to connect to, such as
redis://127.0.0.1:6379
. - --compression <compression>
-
The compression to use. Possible values are
snappy
,lz4
, andnone
. Default issnappy
. - --serialization <serialization>
-
Can be \"fst\" or \"jdk\". Defaults to fst. This serialization codec is only used for the data index when secondary indexing.
- --username <username>
-
A Redis username to be used with Redis AUTH.
- --password <password>
-
The password for the user. Can be specified as
pass:<password>
,file:<local file containing the password>
,propfile:<local properties file containing the password>:<property file key>
,env:<variable containing the pass>
, orstdin
.
When the rocksdb
type option is used, additional options are:
- --dir <path>
-
The directory to read/write to. Defaults to "rocksdb" in the working directory.
- --compactOnWrite <enabled>
-
Whether to compact on every write, if
false
it will only compact on merge. Default istrue
. - --batchWriteSize <count>
-
The size (in records) for each batched write. Anything ⇐ 1 will use synchronous single record writes without batching. Default is 1000.
When the filesystem
type option is used, additional options are:
- --dir <path>
-
The directory to read/write to. Defaults to "geowave" in the working directory.
- --format <format>
-
Optionally use a formatter configured with Java SPI of type org.locationtech.geowave.datastore.filesystem.FileSystemDataFormatterSpi. Defaults to 'binary' which is a compact geowave serialization. Use
geowave util filesystem listformats
to see available formats.
When the cassandra
type option is used, additional options are:
- --contactPoints <contact points>
-
A single contact point or a comma delimited set of contact points to connect to the Cassandra cluster.
- --datacenter <datacenter>
-
The local datacenter.
- --replicas <count>
-
The number of replicas to use when creating a new keyspace. Default is 3.
- --durableWrites <enabled>
-
Whether to write to commit log for durability, configured only on creation of new keyspace. Default is
true
. - --batchWriteSize <count>
-
The number of inserts in a batch write. Default is 50.
- --gcGraceSeconds <count>
-
The gc_grace_seconds applied to each Cassandra table. Defaults to 10 days and major compaction should be triggered at least as often.
- --compactionStrategy <compactionStrategy>
-
The compaction strategy applied to each Cassandra table. Available options are LeveledCompactionStrategy, SizeTieredCompactionStrategy, or TimeWindowCompactionStrategy.
- --tableOptions <tableOptions>
-
Any general table options as 'key=value' applied to each Cassandra table.
When the dynamodb
type option is used, additional options are:
- * --endpoint <endpoint>
-
[REQUIRED (or
-region
)] The endpoint to connect to. - * --region <region>
-
[REQUIRED (or
-endpoint
)] The AWS region to use. - --initialWriteCapacity <count>
-
The maximum number of writes consumed per second before throttling occurs. Default is 5.
- --initialReadCapacity <count>
-
The maximum number of strongly consistent reads consumed per second before throttling occurs. Default is 5.
- --maxConnections <count>
-
The maximum number of open http(s) connections active at any given time. Default is 50.
- --protocol <protocol>
-
The protocol to use. Possible values are
HTTP
orHTTPS
, default isHTTPS
. - --cacheResponseMetadata <enabled>
-
Whether to cache responses from AWS. High performance systems can disable this but debugging will be more difficult. Default is
true
.
When the kudu
type option is used, additional options are:
- * --kuduMaster <url>
-
A URL for the Kudu master node.
When the bigtable
type option is used, additional options are:
- --projectId <project>
-
The Bigtable project to connect to. Default is
geowave-bigtable-project-id
. - --instanceId <instance>
-
The Bigtable instance to connect to. Default is
geowave-bigtable-instance-id
. - --scanCacheSize <size>
-
The number of rows passed to each scanner (higher values will enable faster scanners, but will use more memory).
EXAMPLES
Add a data store called example
that uses a locally running Accumulo instance:
geowave store add -t accumulo --zookeeper localhost:2181 --instance accumulo --user root --password secret example
Add a data store called example
that uses a locally running HBase instance:
geowave store add -t hbase --zookeeper localhost:2181 example
Add a data store called example
that uses a RocksDB database in the current directory:
geowave store add -t rocksdb example
Copy Store with MapReduce
DESCRIPTION
This command copies all of the data from one data store to another existing data store using MapReduce.
OPTIONS
- * --hdfsHostPort <host>
-
The HDFS host and port.
- * --jobSubmissionHostPort <host>
-
The job submission tracker host and port.
- --maxSplits <count>
-
The maximum partitions for the input data.
- --minSplits <count>
-
The minimum partitions for the input data.
- --numReducers <count>
-
Number of threads writing at a time. Default is 8.
Copy Store Configuration
DESCRIPTION
This command copies and modifies an existing GeoWave store. It is possible to override configuration options as you copy by specifying the options after the new name, such as store copycfg old new --gwNamespace new_namespace
. It is important to note that this command does not copy data, only the data store configuration.
Remove Store
Index Commands
Commands for managing GeoWave indices.
Add Index
OPTIONS
- -np, --numPartitions <count>
-
The number of partitions. Default is 1.
- -ps, --partitionStrategy <strategy>
-
The partition strategy to use. Possible values are
NONE
,HASH
, andROUND_ROBIN
, default isNONE
. - * -t, --type <type>
-
The type of index, such as spatial, temporal, or spatial_temporal
When the spatial
type option is used, additional options are:
- -c --crs <crs>
-
The native Coordinate Reference System used within the index. All spatial data will be projected into this CRS for appropriate indexing as needed. Default is
EPSG:4326
. - -fp, --fullGeometryPrecision
-
If specified, geometry will be encoded losslessly. Uses more disk space.
- -gp, --geometryPrecision <precision>
-
The maximum precision of the geometry when encoding. Lower precision will save more disk space when encoding. Possible values are between -8 and 7, default is 7.
- --storeTime
-
If specified, the index will store temporal values. This allows it to slightly more efficiently run spatial-temporal queries although if spatial-temporal queries are a common use case, a separate spatial-temporal index is recommended.
When the spatial_temporal
type option is used, additional options are:
- -c --crs <crs>
-
The native Coordinate Reference System used within the index. All spatial data will be projected into this CRS for appropriate indexing as needed. Default is
EPSG:4326
. - -fp, --fullGeometryPrecision
-
If specified, geometry will be encoded losslessly. Uses more disk space.
- -gp, --geometryPrecision <precision>
-
The maximum precision of the geometry when encoding. Lower precision will save more disk space when encoding. Possible values are between -8 and 7, default is 7.
- --bias <bias>
-
The bias of the spatial-temporal index. There can be more precision given to time or space if necessary. Possible values are
TEMPORAL
,BALANCED
, andSPATIAL
, default isBALANCED
. - --maxDuplicates <count>
-
The max number of duplicates per dimension range. The default is 2 per range (for example lines and polygon timestamp data would be up to 4 because it is 2 dimensions, and line/poly time range data would be 8).
- --period <periodicity>
-
The periodicity of the temporal dimension. Because time is continuous, it is binned at this interval. Possible values are
MINUTE
,HOUR
,DAY
,WEEK
,MONTH
,YEAR
, andDECADE
, default isYEAR
.
When the temporal
type option is used, additional options are:
- --maxDuplicates <count>
-
The max number of duplicates per dimension range. The default is 2 per range (for example lines and polygon timestamp data would be up to 4 because it is 2 dimensions, and line/poly time range data would be 8).
- --period <periodicity>
-
The periodicity of the temporal dimension. Because time is continuous, it is binned at this interval. Possible values are
MINUTE
,HOUR
,DAY
,WEEK
,MONTH
,YEAR
, andDECADE
, default isYEAR
. - --noTimeRange
-
If specified, the index will not support time ranges, which can be more efficient.
EXAMPLES
Add a spatial index called spatial_idx
with CRS EPSG:3857
to the example
data store:
geowave index add -t spatial -c EPSG:3857 example spatial_idx
Add a spatial-temporal index called st_idx
with a periodicity of MONTH
to the example
data store:
geowave index add -t spatial_temporal --period MONTH example st_idx
Type Commands
Commands for managing GeoWave types.
Add Type
DESCRIPTION
This command is similar to ingest localToGW
, but does not ingest any data. It will use the specified format plugins to determine the available data types and add them to the data store. This can be useful if a user would like to add statistics to a type prior to ingest. Note that because this command uses the same format plugins as the ingest system, many of the option descriptions will mention ingest
, but this command will only add data types.
OPTIONS
- -x, --extension <extensions>
-
Individual or comma-delimited set of file extensions to accept.
- -f, --formats <formats>
-
Explicitly set the formats by name (or multiple comma-delimited formats). If not set, all available formats will be used.
- -v, --visibility <visibility>
-
The global visibility of the data ingested (optional; if not specified, the data will be unrestricted)
- -fv, --fieldVisibility <visibility>
-
Specify the visibility of a specific field in the format
<fieldName>:<visibility>
. This option can be specified multiple times for different fields. - -va, --visibilityAttribute <field>
-
Specify a field that contains visibility information for the whole row. If specified, any field visibilities defined by
-fv
will be ignored. - --jsonVisibilityAttribute
-
If specified, the value of the visibility field defined by
-va
will be treated as a JSON object with keys that represent fields and values that represent their visibility.
When the avro
format is used, additional options are:
- --avro.avro
-
If specified, indicates that the operation should use Avro feature serialization.
- --avro.cql <filter>
-
An optional CQL filter. If specified, only data matching the filter will be ingested.
- --avro.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.
- --avro.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --avro.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --avro.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the gdelt
format is used, additional options are:
- --gdelt.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gdelt.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gdelt.extended
-
A flag to indicate whether extended data format should be used.
- --gdelt.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gdelt.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gdelt.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gdelt.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geolife
format is used, additional options are:
- --geolife.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --geolife.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geolife.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.
- --geolife.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --geolife.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --geolife.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geotools-raster
format is used, additional options are:
- --geotools-raster.coverage <name>
-
Coverage name for the raster. Default is the name of the file.
- --geotools-raster.crs <crs>
-
A CRS override for the provided raster file.
- --geotools-raster.histogram
-
If specified, build a histogram of samples per band on ingest for performing band equalization.
- --geotools-raster.mergeStrategy <strategy>
-
The tile merge strategy to use for mosaic. Specifying
no-data
will mosaic the most recent tile over the previous tiles, except where there are no data values. By defaultnone
is used. - --geotools-raster.nodata <value>
-
Optional parameter to set
no data
values, if 1 value is giving it is applied for each band, if multiple are given then the firsttotalNoDataValues
/totalBands
are applied to the first band and so on, so each band can have multiple differingno data
values if needed. - --geotools-raster.pyramid
-
If specified, build an image pyramid on ingest for quick reduced resolution query.
- --geotools-raster.separateBands
-
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn
appended to it wheren
is the band’s index. - --geotools-raster.tileSize <size>
-
The tile size of stored tiles. Default is 256.
When the geotools-vector
format is used, additional options are:
- --geotools-vector.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geotools-vector.data <fields>
-
A map of date field names to the date format of the file. Use commas to separate each entry, then the first
:
character will separate the field name from the format. Use\,
to include a comma in the format. For example:time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss
configures fieldstime
andtime2
as dates with different formats. - --geotools-vector.type <types>
-
Optional parameter that specifies specific type name(s) from the source file.
When the gpx
format is used, additional options are:
- --gpx.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gpx.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gpx.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gpx.maxLength <degrees>
-
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.
- --gpx.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gpx.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gpx.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the tdrive
format is used, additional options are:
- --tdrive.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --tdrive.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --tdrive.typename <types>
-
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --tdrive.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --tdrive.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --tdrive.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the twitter
format is used, additional options are:
- --twitter.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --twitter.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --twitter.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --twitter.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --twitter.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --twitter.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
EXAMPLES
Add all types for GDELT data from an area around Germany from the gdelt_data
directory to a GeoWave data store called example
with the spatial-idx
index:
geowave type add -f gdelt --gdelt.cql "BBOX(geometry,5.87,47.2,15.04,54.95)" ./gdelt_data example spatial-idx
Add the type from a shapefile called states.shp
to the example
data store with the spatial-idx
index:
geowave type add -f geotools-vector states.shp example spatial-idx
Add the type from a shapefile called `states.shp` to the `example` data store with the `spatial-idx` index, but make the `population` field require an authorization of "secret" to access:
geowave type add -f geotools-vector -fv population:secret states.shp example spatial-idx
Statistics Commands
Commands to manage GeoWave statistics.
List Stats
DESCRIPTION
This command prints statistics of a GeoWave data store (and optionally of a single type) to the standard output.
OPTIONS
- --limit <limit>
-
Limit the number or rows returned. By default, all results will be displayed.
- --csv
-
Output statistics in CSV format.
- -t, --type <type>
-
The type of the statistic.
- --typeName <name>
-
The name of the data type adapter, for field and type statistics.
- --indexName <name>
-
The name of the index, for index statistics.
- --fieldName <name>
-
The name of the field, for field statistics.
- --tag <tag>
-
The tag of the statistic.
- --auth <authorizations>
-
The authorizations used when querying statistics.
List Stat Types
NAME
geowave-stat-listtypes - List statistic types that are compatible with the given data store, if no data store is provided, all registered statistics will be listed.
DESCRIPTION
This command prints statistic types that are compatible with the given options to the standard output.
OPTIONS
- --indexName <name>
-
If specified, only statistics that are compatible with this index will be listed.
- --typeName <name>
-
If specified, only statistics that are compatible with this type will be listed.
- --fieldName <name>
-
If specified, only statistics that are compatible with this field will be displayed.
- -b, --binningStrategies
-
If specified, a list of registered binning strategies will be displayed.
Add Stat
DESCRIPTION
This command adds a statistic from a GeoWave data store. Each statistic and binning strategy can provide their own options. For a list of binning strategies and statistics that are available, see geowave stat listtypes
.
OPTIONS
- * -t, --type <type>
-
The statistic type to add.
- --indexName <name>
-
The index for the statistic, if the statistic is an index statistic.
- --typeName <name>
-
The type for the statistic, if the statistic is a field or type statistic.
- --fieldName <name>
-
The field name for the statistic, if the statistic is a field statistic.
- --tag <tag>
-
An optional tag to uniquely identify the statistic. If none is specified, a default will be chosen.
- -b, --binningStrategy <strategy>
-
The binning strategy to use for the statistic. If none is specified, the statistic will be aggregated to a single bin.
- -skip, --skipCalculation
-
If specified, the statistic will be added without calculating its initial value. This can be useful if you plan on adding several statistics and then running
geowave stat recalc
.
EXAMPLES
Add a COUNT
statistic to the counties
type binned by the state_code
field in the example
data store:
geowave stat add example -t COUNT --typeName counties -b FIELD_VALUE --binField state_code
List the options available for the COUNT
statistic and FIELD_VALUE
binning strategy:
geowave help stat add example -t COUNT -b FIELD_VALUE
Remove Stat
OPTIONS
- --all
-
If specified, all matching statistics will be removed.
- --force
-
Force an internal statistic to be removed. IMPORTANT: Removing statistics that are marked as
internal
can have a detrimental impact on performance! - -t, --type <type>
-
The type of the statistic.
- --typeName <name>
-
The name of the data type adapter, for field and type statistics.
- --indexName <name>
-
The name of the index, for index statistics.
- --fieldName <name>
-
The name of the field, for field statistics.
- --tag <tag>
-
The tag of the statistic.
- --auth <authorizations>
-
The authorizations used when querying statistics.
Recalculate Stats
DESCRIPTION
This command recalculates the statistics of an existing GeoWave data store. If a type name is provided as an options, only the statistics for that type will be recalculated.
OPTIONS
- --all <name>
-
If specified, all matching statistics will be recalculated.
- -t, --type <type>
-
The type of the statistic.
- --typeName <name>
-
The name of the data type adapter, for field and type statistics.
- --indexName <name>
-
The name of the index, for index statistics.
- --fieldName <name>
-
The name of the field, for field statistics.
- --tag <tag>
-
The tag of the statistic.
- --auth <authorizations>
-
The authorizations used when querying statistics.
Compact Stats
DESCRIPTION
Whenever new data is ingested into a type, additional statistics are calculated for the new data. If data is frequently ingested, the number of rows that need to be merged to compute a statistic may begin to have an impact on performance. This command aggregates all of those statistic values down into a single value to improve performance in those cases.
Ingest Commands
Commands that ingest data directly into GeoWave or stage data to be ingested into GeoWave.
Ingest Local to GeoWave
SYNOPSIS
geowave ingest localToGW [options] <file or directory> <store name> <comma delimited index list>
DESCRIPTION
This command runs the ingest code (parse to features, load features to GeoWave) against local file system content.
OPTIONS
- -t, --threads <count>
-
Number of threads to use for ingest. Default is 1.
- -x, --extension <extensions>
-
Individual or comma-delimited set of file extensions to accept.
- -f, --formats <formats>
-
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.
- -v, --visibility <visibility>
-
The global visibility of the data ingested (optional; if not specified, the data will be unrestricted)
- -fv, --fieldVisibility <visibility>
-
Specify the visibility of a specific field in the format
<fieldName>:<visibility>
. This option can be specified multiple times for different fields. - -va, --visibilityAttribute <field>
-
Specify a field that contains visibility information for the whole row. If specified, any field visibilities defined by
-fv
will be ignored. - --jsonVisibilityAttribute
-
If specified, the value of the visibility field defined by
-va
will be treated as a JSON object with keys that represent fields and values that represent their visibility.
When the avro
format is used, additional options are:
- --avro.avro
-
If specified, indicates that the operation should use Avro feature serialization.
- --avro.cql <filter>
-
An optional CQL filter. If specified, only data matching the filter will be ingested.
- --avro.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.
- --avro.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --avro.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --avro.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the gdelt
format is used, additional options are:
- --gdelt.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gdelt.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gdelt.extended
-
A flag to indicate whether extended data format should be used.
- --gdelt.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gdelt.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gdelt.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gdelt.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geolife
format is used, additional options are:
- --geolife.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --geolife.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geolife.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.
- --geolife.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --geolife.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --geolife.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geotools-raster
format is used, additional options are:
- --geotools-raster.coverage <name>
-
Coverage name for the raster. Default is the name of the file.
- --geotools-raster.crs <crs>
-
A CRS override for the provided raster file.
- --geotools-raster.histogram
-
If specified, build a histogram of samples per band on ingest for performing band equalization.
- --geotools-raster.mergeStrategy <strategy>
-
The tile merge strategy to use for mosaic. Specifying
no-data
will mosaic the most recent tile over the previous tiles, except where there are no data values. By defaultnone
is used. - --geotools-raster.nodata <value>
-
Optional parameter to set
no data
values, if 1 value is giving it is applied for each band, if multiple are given then the firsttotalNoDataValues
/totalBands
are applied to the first band and so on, so each band can have multiple differingno data
values if needed. - --geotools-raster.pyramid
-
If specified, build an image pyramid on ingest for quick reduced resolution query.
- --geotools-raster.separateBands
-
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn
appended to it wheren
is the band’s index. - --geotools-raster.tileSize <size>
-
The tile size of stored tiles. Default is 256.
When the geotools-vector
format is used, additional options are:
- --geotools-vector.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geotools-vector.data <fields>
-
A map of date field names to the date format of the file. Use commas to separate each entry, then the first
:
character will separate the field name from the format. Use\,
to include a comma in the format. For example:time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss
configures fieldstime
andtime2
as dates with different formats. - --geotools-vector.type <types>
-
Optional parameter that specifies specific type name(s) from the source file.
When the gpx
format is used, additional options are:
- --gpx.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gpx.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gpx.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gpx.maxLength <degrees>
-
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.
- --gpx.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gpx.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gpx.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the tdrive
format is used, additional options are:
- --tdrive.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --tdrive.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --tdrive.typename <types>
-
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --tdrive.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --tdrive.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --tdrive.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the twitter
format is used, additional options are:
- --twitter.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --twitter.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --twitter.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --twitter.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --twitter.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --twitter.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
EXAMPLES
Ingest GDELT data from an area around Germany from the gdelt_data
directory into a GeoWave data store called example
in the spatial-idx
index:
geowave ingest localToGW -f gdelt --gdelt.cql "BBOX(geometry,5.87,47.2,15.04,54.95)" ./gdelt_data example spatial-idx
Ingest a shapefile called states.shp
into the example
data store in the spatial-idx
index:
geowave ingest localToGW -f geotools-vector states.shp example spatial-idx
Ingest Kafka to GeoWave
OPTIONS
- --bootstrapServers <brokers>
-
This is for bootstrapping and the producer will only use it for getting metadata (topics, partitions and replicas). The socket connections for sending the actual data will be established based on the broker information returned in the metadata. The format is
host1:port1,host2:port2
, and the list can be a subset of brokers or a VIP pointing to a subset of brokers. - --autoOffsetReset <offset>
-
What to do when there is no initial offset in ZooKeeper or if an offset is out of range. If
earliest
is used, automatically reset the offset to the smallest offset. Iflatest
is used, automatically reset the offset to the largest offset. Ifnone
is used, don’t reset the offset. Otherwise, throw an exception to the consumer. - --batchSize <size>
-
The data will automatically flush after this number of entries. Default is 10,000.
- --consumerTimeoutMs <timeout>
-
By default, this value is -1 and a consumer blocks indefinitely if no new message is available for consumption. By setting the value to a positive integer, a timeout exception is thrown to the consumer if no message is available for consumption after the specified timeout value.
- --maxPartitionFetchBytes <bytes>
-
The number of bytes of messages to attempt to fetch for each topic-partition in each fetch request. These bytes will be read into memory for each partition, so this helps control the memory used by the consumer. The fetch request size must be at least as large as the maximum message size the server allows or else it is possible for the producer to send messages larger than the consumer can fetch.
- --groupId <id>
-
A string that uniquely identifies the group of consumer processes to which this consumer belongs. By setting the same group id multiple processes indicate that they are all part of the same consumer group.
- * --kafkaprops <file>
-
Properties file containing Kafka properties.
- --reconnectOnTimeout
-
If specified, when the consumer timeout occurs (based on the kafka property
consumer.timeout.ms
), a flush will occur and immediately reconnect. - -x, --extension <extensions>
-
Individual or comma-delimited set of file extensions to accept.
- -f, --formats <formats>
-
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.
- -v, --visibility <visibility>
-
The global visibility of the data ingested (optional; if not specified, the data will be unrestricted)
- -fv, --fieldVisibility <visibility>
-
Specify the visibility of a specific field in the format
<fieldName>:<visibility>
. This option can be specified multiple times for different fields. - -va, --visibilityAttribute <field>
-
Specify a field that contains visibility information for the whole row. If specified, any field visibilities defined by
-fv
will be ignored. - --jsonVisibilityAttribute
-
If specified, the value of the visibility field defined by
-va
will be treated as a JSON object with keys that represent fields and values that represent their visibility.
When the avro
format is used, additional options are:
- --avro.avro
-
If specified, indicates that the operation should use Avro feature serialization.
- --avro.cql <filter>
-
An optional CQL filter. If specified, only data matching the filter will be ingested.
- --avro.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.
- --avro.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --avro.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --avro.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the gdelt
format is used, additional options are:
- --gdelt.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gdelt.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gdelt.extended
-
A flag to indicate whether extended data format should be used.
- --gdelt.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gdelt.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gdelt.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gdelt.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geolife
format is used, additional options are:
- --geolife.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --geolife.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geolife.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.
- --geolife.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --geolife.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --geolife.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geotools-raster
format is used, additional options are:
- --geotools-raster.coverage <name>
-
Coverage name for the raster. Default is the name of the file.
- --geotools-raster.crs <crs>
-
A CRS override for the provided raster file.
- --geotools-raster.histogram
-
If specified, build a histogram of samples per band on ingest for performing band equalization.
- --geotools-raster.mergeStrategy <strategy>
-
The tile merge strategy to use for mosaic. Specifying
no-data
will mosaic the most recent tile over the previous tiles, except where there are no data values. By defaultnone
is used. - --geotools-raster.nodata <value>
-
Optional parameter to set
no data
values, if 1 value is giving it is applied for each band, if multiple are given then the firsttotalNoDataValues
/totalBands
are applied to the first band and so on, so each band can have multiple differingno data
values if needed. - --geotools-raster.pyramid
-
If specified, build an image pyramid on ingest for quick reduced resolution query.
- --geotools-raster.separateBands
-
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn
appended to it wheren
is the band’s index. - --geotools-raster.tileSize <size>
-
The tile size of stored tiles. Default is 256.
When the geotools-vector
format is used, additional options are:
- --geotools-vector.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geotools-vector.data <fields>
-
A map of date field names to the date format of the file. Use commas to separate each entry, then the first
:
character will separate the field name from the format. Use\,
to include a comma in the format. For example:time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss
configures fieldstime
andtime2
as dates with different formats. - --geotools-vector.type <types>
-
Optional parameter that specifies specific type name(s) from the source file.
When the gpx
format is used, additional options are:
- --gpx.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gpx.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gpx.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gpx.maxLength <degrees>
-
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.
- --gpx.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gpx.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gpx.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the tdrive
format is used, additional options are:
- --tdrive.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --tdrive.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --tdrive.typename <types>
-
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --tdrive.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --tdrive.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --tdrive.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the twitter
format is used, additional options are:
- --twitter.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --twitter.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --twitter.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --twitter.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --twitter.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --twitter.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
Stage Local to HDFS
SYNOPSIS
geowave ingest localToHdfs [options] <file or directory> <hdfs host:port> <path to base directory to write to>
OPTIONS
- -x, --extension <extensions>
-
Individual or comma-delimited set of file extensions to accept.
- -f, --formats <formats>
-
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.
When the avro
format is used, additional options are:
- --avro.avro
-
If specified, indicates that the operation should use Avro feature serialization.
- --avro.cql <filter>
-
An optional CQL filter. If specified, only data matching the filter will be ingested.
- --avro.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.
- --avro.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --avro.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --avro.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the gdelt
format is used, additional options are:
- --gdelt.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gdelt.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gdelt.extended
-
A flag to indicate whether extended data format should be used.
- --gdelt.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gdelt.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gdelt.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gdelt.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geolife
format is used, additional options are:
- --geolife.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --geolife.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geolife.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.
- --geolife.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --geolife.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --geolife.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geotools-raster
format is used, additional options are:
- --geotools-raster.coverage <name>
-
Coverage name for the raster. Default is the name of the file.
- --geotools-raster.crs <crs>
-
A CRS override for the provided raster file.
- --geotools-raster.histogram
-
If specified, build a histogram of samples per band on ingest for performing band equalization.
- --geotools-raster.mergeStrategy <strategy>
-
The tile merge strategy to use for mosaic. Specifying
no-data
will mosaic the most recent tile over the previous tiles, except where there are no data values. By defaultnone
is used. - --geotools-raster.nodata <value>
-
Optional parameter to set
no data
values, if 1 value is giving it is applied for each band, if multiple are given then the firsttotalNoDataValues
/totalBands
are applied to the first band and so on, so each band can have multiple differingno data
values if needed. - --geotools-raster.pyramid
-
If specified, build an image pyramid on ingest for quick reduced resolution query.
- --geotools-raster.separateBands
-
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn
appended to it wheren
is the band’s index. - --geotools-raster.tileSize <size>
-
The tile size of stored tiles. Default is 256.
When the geotools-vector
format is used, additional options are:
- --geotools-vector.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geotools-vector.data <fields>
-
A map of date field names to the date format of the file. Use commas to separate each entry, then the first
:
character will separate the field name from the format. Use\,
to include a comma in the format. For example:time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss
configures fieldstime
andtime2
as dates with different formats. - --geotools-vector.type <types>
-
Optional parameter that specifies specific type name(s) from the source file.
When the gpx
format is used, additional options are:
- --gpx.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gpx.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gpx.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gpx.maxLength <degrees>
-
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.
- --gpx.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gpx.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gpx.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the tdrive
format is used, additional options are:
- --tdrive.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --tdrive.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --tdrive.typename <types>
-
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --tdrive.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --tdrive.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --tdrive.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the twitter
format is used, additional options are:
- --twitter.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --twitter.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --twitter.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --twitter.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --twitter.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --twitter.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
Stage Local to Kafka
OPTIONS
- * --kafkaprops <file>
-
Properties file containing Kafka properties
- --bootstrapServers <brokers>
-
This is for bootstrapping and the producer will only use it for getting metadata (topics, partitions and replicas). The socket connections for sending the actual data will be established based on the broker information returned in the metadata. The format is
host1:port1,host2:port2
, and the list can be a subset of brokers or a VIP pointing to a subset of brokers. - --retryBackoffMs <time>
-
The amount of time to wait before attempting to retry a failed produce request to a given topic partition. This avoids repeated sending-and-failing in a tight loop.
- -x, --extension <extensions>
-
Individual or comma-delimited set of file extensions to accept.
- -f, --formats <formats>
-
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.
When the avro
format is used, additional options are:
- --avro.avro
-
If specified, indicates that the operation should use Avro feature serialization.
- --avro.cql <filter>
-
An optional CQL filter. If specified, only data matching the filter will be ingested.
- --avro.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.
- --avro.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --avro.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --avro.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the gdelt
format is used, additional options are:
- --gdelt.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gdelt.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gdelt.extended
-
A flag to indicate whether extended data format should be used.
- --gdelt.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gdelt.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gdelt.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gdelt.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geolife
format is used, additional options are:
- --geolife.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --geolife.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geolife.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.
- --geolife.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --geolife.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --geolife.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geotools-raster
format is used, additional options are:
- --geotools-raster.coverage <name>
-
Coverage name for the raster. Default is the name of the file.
- --geotools-raster.crs <crs>
-
A CRS override for the provided raster file.
- --geotools-raster.histogram
-
If specified, build a histogram of samples per band on ingest for performing band equalization.
- --geotools-raster.mergeStrategy <strategy>
-
The tile merge strategy to use for mosaic. Specifying
no-data
will mosaic the most recent tile over the previous tiles, except where there are no data values. By defaultnone
is used. - --geotools-raster.nodata <value>
-
Optional parameter to set
no data
values, if 1 value is giving it is applied for each band, if multiple are given then the firsttotalNoDataValues
/totalBands
are applied to the first band and so on, so each band can have multiple differingno data
values if needed. - --geotools-raster.pyramid
-
If specified, build an image pyramid on ingest for quick reduced resolution query.
- --geotools-raster.separateBands
-
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn
appended to it wheren
is the band’s index. - --geotools-raster.tileSize <size>
-
The tile size of stored tiles. Default is 256.
When the geotools-vector
format is used, additional options are:
- --geotools-vector.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geotools-vector.data <fields>
-
A map of date field names to the date format of the file. Use commas to separate each entry, then the first
:
character will separate the field name from the format. Use\,
to include a comma in the format. For example:time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss
configures fieldstime
andtime2
as dates with different formats. - --geotools-vector.type <types>
-
Optional parameter that specifies specific type name(s) from the source file.
When the gpx
format is used, additional options are:
- --gpx.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gpx.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gpx.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gpx.maxLength <degrees>
-
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.
- --gpx.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gpx.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gpx.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the tdrive
format is used, additional options are:
- --tdrive.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --tdrive.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --tdrive.typename <types>
-
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --tdrive.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --tdrive.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --tdrive.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the twitter
format is used, additional options are:
- --twitter.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --twitter.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --twitter.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --twitter.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --twitter.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --twitter.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
Ingest Local to GeoWave with MapReduce
NAME
geowave-ingest-localToMrGW - Copy supported files from local file system to HDFS and ingest from HDFS
SYNOPSIS
geowave ingest localToMrGW [options] <file or directory> <hdfs host:port> <path to base directory to write to> <store name> <comma delimited index list>
DESCRIPTION
This command copies supported files from local file system to HDFS and then ingests from HDFS.
OPTIONS
- --jobtracker <host>
-
Hadoop job tracker hostname and port in the format
hostname:port
. - --resourceman <host>
-
Yarn resource manager hostname and port in the format
hostname:port
. - -x, --extension <extensions>
-
Individual or comma-delimited set of file extensions to accept.
- -f, --formats <formats>
-
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.
- -v, --visibility <visibility>
-
The global visibility of the data ingested (optional; if not specified, the data will be unrestricted)
- -fv, --fieldVisibility <visibility>
-
Specify the visibility of a specific field in the format
<fieldName>:<visibility>
. This option can be specified multiple times for different fields. - -va, --visibilityAttribute <field>
-
Specify a field that contains visibility information for the whole row. If specified, any field visibilities defined by
-fv
will be ignored. - --jsonVisibilityAttribute
-
If specified, the value of the visibility field defined by
-va
will be treated as a JSON object with keys that represent fields and values that represent their visibility.
When the avro
format is used, additional options are:
- --avro.avro
-
If specified, indicates that the operation should use Avro feature serialization.
- --avro.cql <filter>
-
An optional CQL filter. If specified, only data matching the filter will be ingested.
- --avro.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.
- --avro.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --avro.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --avro.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the gdelt
format is used, additional options are:
- --gdelt.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gdelt.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gdelt.extended
-
A flag to indicate whether extended data format should be used.
- --gdelt.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gdelt.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gdelt.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gdelt.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geolife
format is used, additional options are:
- --geolife.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --geolife.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geolife.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.
- --geolife.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --geolife.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --geolife.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geotools-raster
format is used, additional options are:
- --geotools-raster.coverage <name>
-
Coverage name for the raster. Default is the name of the file.
- --geotools-raster.crs <crs>
-
A CRS override for the provided raster file.
- --geotools-raster.histogram
-
If specified, build a histogram of samples per band on ingest for performing band equalization.
- --geotools-raster.mergeStrategy <strategy>
-
The tile merge strategy to use for mosaic. Specifying
no-data
will mosaic the most recent tile over the previous tiles, except where there are no data values. By defaultnone
is used. - --geotools-raster.nodata <value>
-
Optional parameter to set
no data
values, if 1 value is giving it is applied for each band, if multiple are given then the firsttotalNoDataValues
/totalBands
are applied to the first band and so on, so each band can have multiple differingno data
values if needed. - --geotools-raster.pyramid
-
If specified, build an image pyramid on ingest for quick reduced resolution query.
- --geotools-raster.separateBands
-
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn
appended to it wheren
is the band’s index. - --geotools-raster.tileSize <size>
-
The tile size of stored tiles. Default is 256.
When the geotools-vector
format is used, additional options are:
- --geotools-vector.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geotools-vector.data <fields>
-
A map of date field names to the date format of the file. Use commas to separate each entry, then the first
:
character will separate the field name from the format. Use\,
to include a comma in the format. For example:time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss
configures fieldstime
andtime2
as dates with different formats. - --geotools-vector.type <types>
-
Optional parameter that specifies specific type name(s) from the source file.
When the gpx
format is used, additional options are:
- --gpx.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gpx.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gpx.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gpx.maxLength <degrees>
-
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.
- --gpx.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gpx.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gpx.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the tdrive
format is used, additional options are:
- --tdrive.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --tdrive.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --tdrive.typename <types>
-
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --tdrive.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --tdrive.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --tdrive.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the twitter
format is used, additional options are:
- --twitter.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --twitter.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --twitter.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --twitter.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --twitter.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --twitter.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
Ingest MapReduce to GeoWave
SYNOPSIS
geowave ingest mrToGW [options] <hdfs host:port> <path to base directory to write to> <store name> <comma delimited index list>
OPTIONS
- --jobtracker <host>
-
Hadoop job tracker hostname and port in the format
hostname:port
. - --resourceman <host>
-
Yarn resource manager hostname and port in the format
hostname:port
. - -x, --extension <extensions>
-
Individual or comma-delimited set of file extensions to accept.
- -f, --formats <formats>
-
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.
- -v, --visibility <visibility>
-
The global visibility of the data ingested (optional; if not specified, the data will be unrestricted)
- -fv, --fieldVisibility <visibility>
-
Specify the visibility of a specific field in the format
<fieldName>:<visibility>
. This option can be specified multiple times for different fields. - -va, --visibilityAttribute <field>
-
Specify a field that contains visibility information for the whole row. If specified, any field visibilities defined by
-fv
will be ignored. - --jsonVisibilityAttribute
-
If specified, the value of the visibility field defined by
-va
will be treated as a JSON object with keys that represent fields and values that represent their visibility.
When the avro
format is used, additional options are:
- --avro.avro
-
If specified, indicates that the operation should use Avro feature serialization.
- --avro.cql <filter>
-
An optional CQL filter. If specified, only data matching the filter will be ingested.
- --avro.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default, all type names will be ingested.
- --avro.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --avro.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --avro.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the gdelt
format is used, additional options are:
- --gdelt.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gdelt.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gdelt.extended
-
A flag to indicate whether extended data format should be used.
- --gdelt.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gdelt.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gdelt.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gdelt.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geolife
format is used, additional options are:
- --geolife.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --geolife.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geolife.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified typen ames will be ingested. By default all types will be ingested.
- --geolife.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --geolife.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --geolife.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the geotools-raster
format is used, additional options are:
- --geotools-raster.coverage <name>
-
Coverage name for the raster. Default is the name of the file.
- --geotools-raster.crs <crs>
-
A CRS override for the provided raster file.
- --geotools-raster.histogram
-
If specified, build a histogram of samples per band on ingest for performing band equalization.
- --geotools-raster.mergeStrategy <strategy>
-
The tile merge strategy to use for mosaic. Specifying
no-data
will mosaic the most recent tile over the previous tiles, except where there are no data values. By defaultnone
is used. - --geotools-raster.nodata <value>
-
Optional parameter to set
no data
values, if 1 value is giving it is applied for each band, if multiple are given then the firsttotalNoDataValues
/totalBands
are applied to the first band and so on, so each band can have multiple differingno data
values if needed. - --geotools-raster.pyramid
-
If specified, build an image pyramid on ingest for quick reduced resolution query.
- --geotools-raster.separateBands
-
If specified, separate each band into its own coverage name. By default the coverage name will have
_Bn
appended to it wheren
is the band’s index. - --geotools-raster.tileSize <size>
-
The tile size of stored tiles. Default is 256.
When the geotools-vector
format is used, additional options are:
- --geotools-vector.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --geotools-vector.data <fields>
-
A map of date field names to the date format of the file. Use commas to separate each entry, then the first
:
character will separate the field name from the format. Use\,
to include a comma in the format. For example:time:MM:dd:YYYY,time2:YYYY/MM/dd hh:mm:ss
configures fieldstime
andtime2
as dates with different formats. - --geotools-vector.type <types>
-
Optional parameter that specifies specific type name(s) from the source file.
When the gpx
format is used, additional options are:
- --gpx.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --gpx.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --gpx.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --gpx.maxLength <degrees>
-
Maximum extent (in both dimensions) for gpx track in degrees. Used to remove excessively long gpx tracks.
- --gpx.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --gpx.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --gpx.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the tdrive
format is used, additional options are:
- --tdrive.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --tdrive.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --tdrive.typename <types>
-
A comma-delimitted set of typen ames to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --tdrive.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --tdrive.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --tdrive.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
When the twitter
format is used, additional options are:
- --twitter.avro
-
A flag to indicate whether Avro feature serialization should be used.
- --twitter.cql <filter>
-
A CQL filter, only data matching this filter will be ingested.
- --twitter.typename <types>
-
A comma-delimitted set of type names to ingest, feature types matching the specified type names will be ingested. By default all types will be ingested.
- --twitter.maxVertices <count>
-
Maximum number of vertices to allow for the feature. Features with over this vertice count will be discarded.
- --twitter.minSimpVertices <count>
-
Minimum vertex count to qualify for geometry simplification.
- --twitter.tolerance <tolerance>
-
Maximum error tolerance in geometry simplification. Should range from 0.0 to 1.0 (i.e. .1 = 10%). Default is 0.02.
Ingest Spark to GeoWave
SYNOPSIS
geowave ingest sparkToGW [options] <input directory> <store name> <comma delimited index list>
OPTIONS
- -ho, --hosts <host>
-
The spark driver host. Default is
localhost
. - -m, --master <designation>
-
The spark master designation. Default is
local
. - -n, --name <name>
-
The spark application name. Default is
Spark Ingest
. - -c, --numcores <count>
-
The number of cores to use.
- -e, --numexecutors <count>
-
The number of executors to use.
- -x, --extension <extensions>
-
Individual or comma-delimited set of file extensions to accept.
- -f, --formats <formats>
-
Explicitly set the ingest formats by name (or multiple comma-delimited formats). If not set, all available ingest formats will be used.
- -v, --visibility <visibility>
-
The global visibility of the data ingested (optional; if not specified, the data will be unrestricted)
- -fv, --fieldVisibility <visibility>
-
Specify the visibility of a specific field in the format
<fieldName>:<visibility>
. This option can be specified multiple times for different fields. - -va, --visibilityAttribute <field>
-
Specify a field that contains visibility information for the whole row. If specified, any field visibilities defined by
-fv
will be ignored. - --jsonVisibilityAttribute
-
If specified, the value of the visibility field defined by
-va
will be treated as a JSON object with keys that represent fields and values that represent their visibility.
Query Commands
Commands related to querying data.
Query
DESCRIPTION
This command queries data using an SQL-like syntax. The query language currently only supports SELECT
and DELETE
statements.
The syntax for SELECT
statements is as follows:
SELECT <attributes> FROM <typeName> [ WHERE <filter> ] [ LIMIT <count> ]
Where <attributes>
is a comma-separated list of column selectors or aggregation functions, <typeName>
is the type name, <filter>
is the constraints to filter the results by, and <count>
is the number of results to limit the query to.
The syntax for DELETE
statements is as follows:
DELETE FROM <typeName> [ WHERE <filter> ]
Where <typeName>
is the type name and <filter>
is the constraints to delete results by.
OPTIONS
- --debug
-
If specified, print out additional info for debug purposes.
- -f, --format <format>
-
Output format for query results. Possible values are
console
,csv
,shp
, andgeojson
. Bothshp
andgeojson
formats require that the query results contain at least 1 geometry column. Default isconsole
.
When the csv
format is used, additional options are:
- * -o, --outputFile <file>
-
CSV file to output query results to.
When the shp
format is used, additional options are:
- * -o, --outputFile <file>
-
Shapefile to output query results to.
- -t, --typeName <name>
-
Output feature type name.
When the geojson
format is used, additional options are:
- * -o, --outputFile <file>
-
GeoJson file to output query results to.
- -t, --typeName <name>
-
Output feature type name.
EXAMPLES
Calculate the total population of countries that intersect a bounding box that covers a region of Europe from the example
data store:
geowave query example "SELECT SUM(population) FROM countries WHERE BBOX(geom, 7, 23, 46, 51)"
Select only countries that have a population over 100 million from the example
data store:
geowave query example "SELECT * FROM countries WHERE population > 100000000"
Output country names and populations to a CSV file from the example
data store:
geowave query -f csv -o myfile.csv example "SELECT name, population FROM example.countries"
Analytic Commands
Commands that run MapReduce or Spark processing to enhance an existing GeoWave dataset.
The commands below can also be run as a Yarn or Hadoop API command (i.e. mapreduce). For instance, to run the analytic using Yarn:
|
Density-Based Scan
OPTIONS
- -conf, --mapReduceConfigFile <file>
-
MapReduce configuration file.
- * -hdfsbase, --mapReduceHdfsBaseDir <path>
-
Fully qualified path to the base directory in HDFS.
- * -jobtracker, --mapReduceJobtrackerHostPort <host>
-
[REQUIRED (or
-resourceman
)] Hadoop job tracker hostname and port in the formathostname:port
. - * -resourceman, --mapReduceYarnResourceManager <host>
-
[REQUIRED (or
-jobtracker
)] Yarn resource manager hostname and port in the formathostname:port
. - -hdfs, --mapReduceHdfsHostPort <host>
-
HDFS hostname and port in the format
hostname:port
. - --cdf, --commonDistanceFunctionClass <class>
-
Distance function class that implements
org.locationtech.geowave.analytics.distance.DistanceFn
. - * --query.typeNames <types>
-
The comma-separated list of types to query; by default all types are used.
- --query.auth <auths>
-
The comma-separated list of authorizations used during extract; by default all authorizations are used.
- --query.index <index>
-
The specific index to query; by default one is chosen for each adapter.
- * -emx, --extractMaxInputSplit <size>
-
Maximum HDFS input split size.
- * -emn, --extractMinInputSplit <size>
-
Minimum HDFS input split size.
- -eq, --extractQuery <query>
-
Query
- -ofc, --outputOutputFormat <class>
-
Output format class.
- -ifc, --inputFormatClass <class>
-
Input format class.
- -orc, --outputReducerCount <count>
-
Number of reducers For output.
- * -cmi, --clusteringMaxIterations <count>
-
Maximum number of iterations when finding optimal clusters.
- * -cms, --clusteringMinimumSize <size>
-
Minimum cluster size.
- * -pmd, --partitionMaxDistance <distance>
-
Maximum partition distance.
- -b, --globalBatchId <id>
-
Batch ID.
- -hdt, --hullDataTypeId <id>
-
Data Type ID for a centroid item.
- -hpe, --hullProjectionClass <class>
-
Class to project on to 2D space. Implements
org.locationtech.geowave.analytics.tools.Projection
. - -ons, --outputDataNamespaceUri <namespace>
-
Output namespace for objects that will be written to GeoWave.
- -odt, --outputDataTypeId <id>
-
Output Data ID assigned to objects that will be written to GeoWave.
- -oop, --outputHdfsOutputPath <path>
-
Output HDFS file path.
- -oid, --outputIndexId <index>
-
Output index for objects that will be written to GeoWave.
- -pdt, --partitionDistanceThresholds <thresholds>
-
Comma separated list of distance thresholds, per dimension.
- -pdu, --partitionGeometricDistanceUnit <unit>
-
Geometric distance unit (m=meters,km=kilometers, see symbols for javax.units.BaseUnit).
- -pms, --partitionMaxMemberSelection <count>
-
Maximum number of members selected from a partition.
- -pdr, --partitionPartitionDecreaseRate <rate>
-
Rate of decrease for precision(within (0,1]).
- -pp, --partitionPartitionPrecision <precision>
-
Partition precision.
- -pc, --partitionPartitionerClass <class>
-
Index identifier for centroids.
- -psp, --partitionSecondaryPartitionerClass <class>
-
Perform secondary partitioning with the provided class.
EXAMPLES
Run through 5 max iterations (-cmi
), with max distance between points as 10 meters (-cms
), min HDFS input split is 2 (-emn
), max HDFS input split is 6 (-emx
), max search distance is 1000 meters (-pmd
), reducer count is 4 (-orc
), the HDFS IPC port is localhost:53000
(-hdfs
), the yarn job tracker is at localhost:8032
(-jobtracker
), the temporary files needed by this job are stored in hdfs:/host:port//user/rwgdrummer
(-hdfsbase
), the data type used is gpxpoint
(-query.typeNames
), and the data store connection parameters are loaded from my_store
.
geowave analytic dbscan -cmi 5 -cms 10 -emn 2 -emx 6 -pmd 1000 -orc 4 -hdfs localhost:53000 -jobtracker localhost:8032 -hdfsbase /user/rwgdrummer --query.typeNames gpxpoint my_store
EXECUTION
DBSCAN uses GeoWaveInputFormat to load data from GeoWave into HDFS. You can use the extract query parameter to limit the records used in the analytic.
It iteratively calls Nearest Neighbor to execute a sequence of concave hulls. The hulls are saved into sequence files written to a temporary HDFS directory, and then read in again for the next DBSCAN iteration.
After completion, the data is written back from HDFS to Accumulo using a job called the "input load runner".
Kernel Density Estimate
OPTIONS
- * --coverageName <name>
-
The output coverage name.
- * --featureType <type>
-
The name of the feature type to run a KDE on.
- * --minLevel <level>
-
The minimum zoom level to run a KDE at.
- * --maxLevel <level>
-
The maximum zoom level to run a KDE at.
- --minSplits <count>
-
The minimum partitions for the input data.
- --maxSplits <count>
-
The maximum partitions for the input data.
- --tileSize <size>
-
The size of output tiles.
- --cqlFilter <filter>
-
An optional CQL filter applied to the input data.
- --indexName <index>
-
An optional index to filter the input data.
- --outputIndex <index>
-
An optional index for output data store. Only spatial index type is supported.
- --hdfsHostPort <host>
-
The HDFS host and port.
- * --jobSubmissionHostPort <host>
-
The job submission tracker host and port in the format
hostname:port
.
EXAMPLES
Perform a Kernel Density Estimation using a local resource manager at port 8032 on the gdeltevent
type. The KDE should be run at zoom levels 5-26 and that the new raster generated should be under the type name gdeltevent_kde
. Finally, the input and output data store is called gdelt
.
geowave analytic kde --featureType gdeltevent --jobSubmissionHostPort localhost:8032 --minLevel 5 --maxLevel 26 --coverageName gdeltevent_kde gdelt gdelt
Kernel Density Estimate on Spark
DESCRIPTION
This command runs a Kernel Density Estimate analytic on GeoWave data using Apache Spark.
OPTIONS
- * --coverageName <name>
-
The output coverage name.
- * --featureType <type>
-
The name of the feature type to run a KDE on.
- * --minLevel <level>
-
The minimum zoom level to run a KDE at.
- * --maxLevel <level>
-
The maximum zoom level to run a KDE at.
- --minSplits <count>
-
The minimum partitions for the input data.
- --maxSplits <count>
-
The maximum partitions for the input data.
- --tileSize <size>
-
The size of output tiles.
- --cqlFilter <filter>
-
An optional CQL filter applied to the input data.
- --indexName <index>
-
An optional index name to filter the input data.
- --outputIndex <index>
-
An optional index for output data store. Only spatial index type is supported.
- -n, --name <name>
-
The Spark application name.
- -ho, --host <host>
-
The Spark driver host.
- -m, --master <designation>
-
The Spark master designation.
EXAMPLES
Perform a Kernel Density Estimation using a local spark cluster on the gdeltevent
type. The KDE should be run at zoom levels 5-26 and that the new raster generated should be under the type name gdeltevent_kde
. Finally, the input and output data store is called gdelt
.
geowave analytic kdespark --featureType gdeltevent -m local --minLevel 5 --maxLevel 26 --coverageName gdeltevent_kde gdelt gdelt
K-means Jump
OPTIONS
- -conf, --mapReduceConfigFile <file>
-
MapReduce configuration file.
- * -hdfsbase, --mapReduceHdfsBaseDir <path>
-
Fully qualified path to the base directory in HDFS.
- * -jobtracker, --mapReduceJobtrackerHostPort <host>
-
[REQUIRED (or
-resourceman
)] Hadoop job tracker hostname and port in the formathostname:port
. - * -resourceman, --mapReduceYarnResourceManager <host>
-
[REQUIRED (or
-jobtracker
)] Yarn resource manager hostname and port in the formathostname:port
. - -hdfs, --mapReduceHdfsHostPort <host>
-
HDFS hostname and port in the format
hostname:port
. - --cdf, --commonDistanceFunctionClass <class>
-
Distance function class that implements
org.locationtech.geowave.analytics.distance.DistanceFn
. - * --query.typeNames <types>
-
The comma-separated list of types to query; by default all types are used.
- --query.auth <auths>
-
The comma-separated list of authorizations used during extract; by default all authorizations are used.
- --query.index <index>
-
The specific index to query; by default one is chosen for each adapter.
- * -emx, --extractMaxInputSplit <size>
-
Maximum HDFS input split size.
- * -emn, --extractMinInputSplit <size>
-
Minimum HDFS input split size.
- -eq, --extractQuery <query>
-
Query
- -ofc, --outputOutputFormat <class>
-
Output format class.
- -ifc, --inputFormatClass <class>
-
Input format class.
- -orc, --outputReducerCount <count>
-
Number of reducers For output.
- -cce, --centroidExtractorClass <class>
-
Centroid exractor class that implements
org.locationtech.geowave.analytics.extract.CentroidExtractor
. - -cid, --centroidIndexId <index>
-
Index to use for centroids.
- -cfc, --centroidWrapperFactoryClass <class>
-
A factory class that implements
org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory
. - -czl, --centroidZoomLevel <level>
-
Zoom level for centroids.
- -cct, --clusteringConverganceTolerance <tolerance>
-
Convergence tolerance.
- * -cmi, --clusteringMaxIterations <count>
-
Maximum number of iterations when finding optimal clusters.
- -crc, --clusteringMaxReducerCount <count>
-
Maximum clustering reducer count.
- * -zl, --clusteringZoomLevels <count>
-
Number of zoom levels to process.
- -dde, --commonDimensionExtractClass <class>
-
Dimension extractor class that implements
org.locationtech.geowave.analytics.extract.DimensionExtractor
. - -ens, --extractDataNamespaceUri <namespace>
-
Output data namespace URI.
- -ede, --extractDimensionExtractClass <class>
-
Class to extract dimensions into a simple feature output.
- -eot, --extractOutputDataTypeId <type>
-
Output data type ID.
- -erc, --extractReducerCount <count>
-
Number of reducers For initial data extraction and de-duplication.
- -b, --globalBatchId <id>
-
Batch ID.
- -pb, --globalParentBatchId <id>
-
Parent Batch ID.
- -hns, --hullDataNamespaceUri <namespace>
-
Data type namespace for a centroid item.
- -hdt, --hullDataTypeId <type>
-
Data type ID for a centroid item.
- -hid, --hullIndexId <index>
-
Index to use for centroids.
- -hpe, --hullProjectionClass <class>
-
Class to project on to 2D space. Implements
org.locationtech.geowave.analytics.tools.Projection
. - -hrc, --hullReducerCount <count>
-
Centroid reducer count.
- -hfc, --hullWrapperFactoryClass <class>
-
Class to create analytic item to capture hulls. Implements
org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory
. - * -jkp, --jumpKplusplusMin <value>
-
The minimum K when K-means parallel takes over sampling.
- * -jrc, --jumpRangeOfCentroids <ranges>
-
Comma-separated range of centroids (e.g. 2,100).
EXAMPLES
The minimum clustering iterations is 15 (-cmi
), the zoom level is 1 (-zl
), the maximum HDFS input split is 4000 (-emx
), the minimum HDFS input split is 100 (-emn
), the temporary files needed by this job are stored in hdfs:/host:port/user/rwgdrummer/temp_dir_kmeans
(-hdfsbase
), the HDFS IPC port is localhost:53000
(-hdfs
), the yarn job tracker is at localhost:8032
(-jobtracker
), the type used is 'hail' (query.typeNames
), the minimum K for K-means parallel sampling is 3 (-jkp
), the comma separated range of centroids is 4,8 (-jrc
), and the data store parameters are loaded from my_store
.
geowave analytic kmeansjump -cmi 15 -zl 1 -emx 4000 -emn 100 -hdfsbase /usr/rwgdrummer/temp_dir_kmeans -hdfs localhost:53000 -jobtracker localhost:8032 --query.typeNames hail -jkp 3 -jrc 4,8 my_store
EXECUTION
KMeansJump uses most of the same parameters from KMeansParallel. It tries every K value given (-jrc) to find the value with least entropy. The other value, jkp
, will specify which K values should use K-means parallel for sampling versus a single sampler (which uses a random sample). For instance, if you specify 4,8 for jrc
and 6 for jkp
, then K=4,5 will use the K-means parallel sampler, while 6,7,8 will use the single sampler.
KMeansJump executes by executing several iterations, running the sampler (described above, which also calls the normal K-means algorithm to determine centroids) and then executing a K-means distortion job, which calculates the entropy of the calculated centroids.
Look at the EXECUTION
documentation for the kmeansparallel
command for discussion of output, tolerance, and performance variables.
K-means Parallel
OPTIONS
- -conf, --mapReduceConfigFile <file>
-
MapReduce configuration file.
- * -hdfsbase, --mapReduceHdfsBaseDir <path>
-
Fully qualified path to the base directory in HDFS.
- * -jobtracker, --mapReduceJobtrackerHostPort <host>
-
[REQUIRED (or
-resourceman
)] Hadoop job tracker hostname and port in the formathostname:port
. - * -resourceman, --mapReduceYarnResourceManager <host>
-
[REQUIRED (or
-jobtracker
)] Yarn resource manager hostname and port in the formathostname:port
. - -hdfs, --mapReduceHdfsHostPort <host>
-
HDFS hostname and port in the format
hostname:port
. - --cdf, --commonDistanceFunctionClass <class>
-
Distance function class that implements
org.locationtech.geowave.analytics.distance.DistanceFn
. - * --query.typeNames <types>
-
The comma-separated list of types to query; by default all types are used.
- --query.auth <auths>
-
The comma-separated list of authorizations used during extract; by default all authorizations are used.
- --query.index <index>
-
The specific index to query; by default one is chosen for each adapter.
- * -emx, --extractMaxInputSplit <size>
-
Maximum HDFS input split size.
- * -emn, --extractMinInputSplit <size>
-
Minimum HDFS input split size.
- -eq, --extractQuery <query>
-
Query
- -ofc, --outputOutputFormat <class>
-
Output format class.
- -ifc, --inputFormatClass <class>
-
Input format class.
- -orc, --outputReducerCount <count>
-
Number of reducers For output.
- -cce, --centroidExtractorClass <class>
-
Centroid exractor class that implements
org.locationtech.geowave.analytics.extract.CentroidExtractor
. - -cid, --centroidIndexId <index>
-
Index to use for centroids.
- -cfc, --centroidWrapperFactoryClass <class>
-
A factory class that implements
org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory
. - -czl, --centroidZoomLevel <level>
-
Zoom level for centroids.
- -cct, --clusteringConverganceTolerance <tolerance>
-
Convergence tolerance.
- * -cmi, --clusteringMaxIterations <count>
-
Maximum number of iterations when finding optimal clusters.
- -crc, --clusteringMaxReducerCount <count>
-
Maximum clustering reducer count.
- * -zl, --clusteringZoomLevels <count>
-
Number of zoom levels to process.
- -dde, --commonDimensionExtractClass <class>
-
Dimension extractor class that implements
org.locationtech.geowave.analytics.extract.DimensionExtractor
. - -ens, --extractDataNamespaceUri <namespace>
-
Output data namespace URI.
- -ede, --extractDimensionExtractClass <class>
-
Class to extract dimensions into a simple feature output.
- -eot, --extractOutputDataTypeId <type>
-
Output data type ID.
- -erc, --extractReducerCount <count>
-
Number of reducers For initial data extraction and de-duplication.
- -b, --globalBatchId <id>
-
Batch ID.
- -pb, --globalParentBatchId <id>
-
Parent Batch ID.
- -hns, --hullDataNamespaceUri <namespace>
-
Data type namespace for a centroid item.
- -hdt, --hullDataTypeId <type>
-
Data type ID for a centroid item.
- -hid, --hullIndexId <index>
-
Index to use for centroids.
- -hpe, --hullProjectionClass <class>
-
Class to project on to 2D space. Implements
org.locationtech.geowave.analytics.tools.Projection
. - -hrc, --hullReducerCount <count>
-
Centroid reducer count.
- -hfc, --hullWrapperFactoryClass <class>
-
Class to create analytic item to capture hulls. Implements
org.locationtech.geowave.analytics.tools.AnalyticItemWrapperFactory
. - * -sxs, --sampleMaxSampleSize <size>
-
Maximum sample size.
- * -sms, --sampleMinSampleSize <size>
-
Minimum sample size.
- * -ssi, --sampleSampleIterations <count>
-
Minimum number of sample iterations.
EXAMPLES
The minimum clustering iterations is 15 (-cmi
), the zoom level is 1 (-zl
), the maximum HDFS input split is 4000 (-emx
), the minimum HDFS input split is 100 (-emn
), the temporary files needed by this job are stored in hdfs:/host:port/user/rwgdrummer/temp_dir_kmeans
(-hdfsbase
), the HDFS IPC port is localhost:53000
(-hdfs
), the Yarn job tracker is at localhost:8032
(-jobtracker
), the type used is 'hail' (-query.typeNames
), the minimum sample size is 4 (-sms
, which is kmin), the maximum sample size is 8 (-sxs
, which is kmax), the minimum number of sampling iterations is 10 (-ssi
), and the data store parameters are loaded from my_store
.
geowave analytic kmeansparallel -cmi 15 -zl 1 -emx 4000 -emn 100 -hdfsbase /usr/rwgdrummer/temp_dir_kmeans -hdfs localhost:53000 -jobtracker localhost:8032 --query.typeNames hail -sms 4 -sxs 8 -ssi 10 my_store
EXECUTION
K-means parallel tries to identify the optimal K (between -sms
and -sxs
) for a set of zoom levels (1 → -zl
). When the zoom level is 1, it will perform a normal K-means and find K clusters. If the zoom level is 2 or higher, it will take each cluster found, and then try to create sub-clusters (bounded by that cluster), identifying a new optimal K for that sub-cluster. As such, without powerful infrastucture, this approach could take a significant amount of time to complete with zoom levels higher than 1.
K-means parallel executes by first executing an extraction and de-duplication on data received via GeoWaveInputFormat
. The data is copied to HDFS for faster processing. The K-sampler job is used to pick sample centroid points. These centroids are then assigned a cost, and then weak centroids are stripped before the K-sampler is executed again. This process iterates several times, before the best centroid locations are found, which are fed into the real K-means algorithm as initial guesses. K-means iterates until the tolerance is reached (-cct
, which defaults to 0.0001) or the max iterations is met (-cmi
).
After execution, K-means parallel writes the centroids to an output data type (-eot
, defaults to centroid
), and then creates an informational set of convex hulls which you can plot in GeoServer to visually identify cluster groups (-hdt
, defaults to convex_hull
).
For tuning performance, you can set the number of reducers used in each step. Extraction/dedupe reducer count is -crc
, clustering reducer count is -erc
, convex Hull reducer count is -hrc
, and output reducer count is -orc
).
If you would like to run the algorithm multiple times, it may be useful to set the batch id (-b
), which can be used to distinguish between multiple batches (runs).
K-means on Spark
OPTIONS
- -ct, --centroidType <type>
-
Feature type name for centroid output. Default is
kmeans-centroids
. - -ch, --computeHullData
-
If specified, hull count, area, and density will be computed.
- --cqlFilter <filter>
-
An optional CQL filter applied to the input data.
- -e, --epsilon <tolerance>
-
The convergence tolerance.
- -f, --featureType <type>
-
Feature type name to query.
- -ht, --hullType <type>
-
Feature type name for hull output. Default is
kmeans-hulls
. - -h, --hulls
-
If specified, convex hulls will be generated.
- -ho, --host <host>
-
The spark driver host. Default is
localhost
. - -m, --master <designation>
-
The spark master designation. Default is
yarn
. - --maxSplits <count>
-
The maximum partitions for the input data.
- --minSplits <count>
-
The minimum partitions for the input data.
- -n, --name <name>
-
The Spark application name. Default is
KMeans Spark
. - -k, --numClusters <count>
-
The number of clusters to generate. Default is 8.
- -i, --numIterations <count>
-
The number of iterations to run. Default is 20.
- -t, --useTime
-
If specified, the time field from the input data will be used.
Nearest Neighbor
DESCRIPTION
This command executes a Nearest Neighbors analytic. This is similar to DBScan, with less arguments. Nearest neighbor just dumps all near neighbors for every feature to a list of pairs. Most developers will want to extend the framework to add their own extensions.
OPTIONS
- -conf, --mapReduceConfigFile <file>
-
MapReduce configuration file.
- * -hdfsbase, --mapReduceHdfsBaseDir <path>
-
Fully qualified path to the base directory in HDFS.
- * -jobtracker, --mapReduceJobtrackerHostPort <host>
-
[REQUIRED (or
-resourceman
)] Hadoop job tracker hostname and port in the formathostname:port
. - * -resourceman, --mapReduceYarnResourceManager <host>
-
[REQUIRED (or
-jobtracker
)] Yarn resource manager hostname and port in the formathostname:port
. - -hdfs, --mapReduceHdfsHostPort <host>
-
HDFS hostname and port in the format
hostname:port
. - --cdf, --commonDistanceFunctionClass <class>
-
Distance function class that implements
org.locationtech.geowave.analytics.distance.DistanceFn
. - * --query.typeNames <types>
-
The comma-separated list of types to query; by default all types are used.
- --query.auth <auths>
-
The comma-separated list of authorizations used during extract; by default all authorizations are used.
- --query.index <index>
-
The specific index to query; by default one is chosen for each adapter.
- * -emx, --extractMaxInputSplit <size>
-
Maximum HDFS input split size.
- * -emn, --extractMinInputSplit <size>
-
Minimum HDFS input split size.
- -eq, --extractQuery <query>
-
Query
- -ofc, --outputOutputFormat <class>
-
Output format class.
- -ifc, --inputFormatClass <class>
-
Input format class.
- -orc, --outputReducerCount <count>
-
Number of reducers For output.
- * -oop, --outputHdfsOutputPath <path>
-
Output HDFS file path.
- -pdt, --partitionDistanceThresholds <thresholds>
-
Comma separated list of distance thresholds, per dimension.
- -pdu, --partitionGeometricDistanceUnit <unit>
-
Geometric distance unit (m=meters,km=kilometers, see symbols for javax.units.BaseUnit).
- * -pmd, --partitionMaxDistance <distance>
-
Maximum partition distance.
- -pms, --partitionMaxMemberSelection <count>
-
Maximum number of members selected from a partition.
- -pp, --partitionPartitionPrecision <precision>
-
Partition precision.
- -pc, --partitionPartitionerClass <class>
-
Perform primary partitioning for centroids with the provided class.
- -psp, --partitionSecondaryPartitionerClass <class>
-
Perform secondary partitioning for centroids with the provided class.
EXAMPLES
The minimum HDFS input split is 2 (-emn
), maximum HDFS input split is 6 (-emx
), maximum search distance is 1000 meters (-pmd
), the sequence file output directory is hdfs://host:port/user/rwgdrummer_out
, reducer count is 4 (-orc
), the HDFS IPC port is localhost:53000
(-hdfs
), the Yarn job tracker is at localhost:8032
(-jobtracker
), the temporary files needed by this job are stored in hdfs:/host:port//user/rwgdrummer
(-hdfsbase
), the input type is gpxpoint
(-query.typeNames
), and the data store parameters are loaded from my_store
.
geowave analytic nn -emn 2 -emx 6 -pmd 1000 -oop /user/rwgdrummer_out -orc 4 -hdfs localhost:53000 -jobtracker localhost:8032 -hdfsbase /user/rwgdrummer --query.typeNames gpxpoint my_store
EXECUTION
To execute nearest neighbor search in GeoWave, we use the concept of a "partitioner" to partition all data on the hilbert curve into square segments for the purposes of parallelizing the search.
The default partitioner will multiply this value by 2 and use that for the actual partition sizes. Because of this, the terminology is a bit confusing, but the -pmd
option is actually the most important variable here, describing the max distance for a point to be considered a neighbor to another point.
Spark SQL
DESCRIPTION
This command executes a Spark SQL query against a given data store, e.g. select * from <store name>[|<type name>] where <condition>
. An alternate way of querying vector data is by using the vector query
command, which does not use Spark, but provides a more robust set of querying capabilities.
OPTIONS
- -n, --name <name>
-
The Spark application name. Default is
GeoWave Spark SQL
. - -ho, --host <host>
-
The Spark driver host. Default is
localhost
. - -m, --master <designation>
-
The Spark master designation. Default is
yarn
. - --csv <file>
-
The output CSV file name.
- --out <store name>
-
The output data store name.
- --outtype <type>
-
The output type to output results to.
- -s, --show <count>
-
Number of result rows to display. Default is 20.
Spark Spatial Join
SYNOPSIS
geowave analytic spatialjoin [options] <left store name> <right store name> <output store name>
DESCRIPTION
This command executes a spatial join, taking two input types and outputting features from each side that match a given predicate.
OPTIONS
- -n, --name <name>
-
The Spark application name. Default is
GeoWave Spark SQL
. - -ho, --host <host>
-
The Spark driver host. Default is
localhost
. - -m, --master <designation>
-
The Spark master designation. Default is
yarn
. - -pc, --partCount <count>
-
The default partition count to set for Spark RDDs. Should be big enough to support the largest RDD that will be used. Sets
spark.default.parallelism
. - -lt, --leftTypeName <type>
-
Feature type name of left store to use in join.
- -ol, --outLeftTypeName <type>
-
Feature type name of left join results.
- -rt, --rightTypeName <type>
-
Feature type name of right store to use in join.
- -or, --outRightTypeName <type>
-
Feature type name of right join results.
- -p, --predicate <predicate>
-
Name of the UDF function to use when performing spatial join. Default is
GeomIntersects
. - -r, --radius <radius>
-
Used for distance join predicate and other spatial operations that require a scalar radius. Default is 0.01.
- -not, --negative
-
Used for testing a negative result from geometry predicate. i.e
GeomIntersects() == false
.
EXAMPLES
Using a local Spark cluster, join all features from a hail
data type in the my_store
store that intersect features from a boundary
type in the other_store
store and output the left results to left
and right
types in the my_store
data store.
geowave analytic spatialjoin -m local -lt hail -rt boundary -ol left -or right my_store other_store my_store
Vector Commands
Commands that operate on vector data.
CQL Delete
Local Export
OPTIONS
- --typeNames <types>
-
Comma separated list of types to export.
- --batchSize <size>
-
Records to process at a time. Default is 10,000.
- --cqlFilter <filter>
-
Filter exported data based on CQL filter.
- --indexName <index>
-
The name of the index to export from.
- * --outputFile <file>
-
The file to export data to.
MapReduce Export
DESCRIPTION
This command will perform a data export for vector data in a data store, and will use MapReduce to support high-volume data stores.
OPTIONS
- --typeNames <types>
-
Comma separated list of types to export.
- --batchSize <size>
-
Records to process at a time. Default is 10,000.
- --cqlFilter <filter>
-
Filter exported data based on CQL filter.
- --indexName <index>
-
The name of the index to export from.
- --maxSplits <count>
-
The maximum partitions for the input data.
- --minSplits <count>
-
The minimum partitions for the input data.
- --resourceManagerHostPort <host>
-
The host and port of the resource manager.
Raster Commands
Commands that operate on raster data.
Resize with MapReduce
DESCRIPTION
This command will resize raster tiles that are stored in a GeoWave data store using MapReduce, and write the resized tiles to a new output store.
OPTIONS
- * --hdfsHostPort <host>
-
The HDFS host and port.
- --indexName <index>
-
The index that the input raster is stored in.
- * --inputCoverageName
-
The name of the input raster coverage.
- * --jobSubmissionHostPort <host>
-
The job submission tracker host and port.
- --maxSplits <count>
-
The maximum partitions for the input data.
- --minSplits <count>
-
The minimum partitions for the input data.
- * --outputCoverageName <name>
-
The output raster coverage name.
- * --outputTileSize <size>
-
The tile size to output.
Resize with Spark
DESCRIPTION
This command will resize raster tiles that are stored in a GeoWave data store using Spark, and write the resized tiles to a new output store.
OPTIONS
- -ho, --host <host>
-
The spark driver host. Default is
localhost
. - --indexName <index>
-
The index that the input raster is stored in.
- * --inputCoverageName <name>
-
The name of the input raster coverage.
- -m, --master <designation>
-
The spark master designation. Default is
yarn
. - --maxSplits <count>
-
The maximum partitions for the input data.
- --minSplits <count>
-
The minimum partitions for the input data.
- -n, --name <name>
-
The Spark application name. Default is
RasterResizeRunner
. - * --outputCoverageName
-
The output raster coverage name.
- * --outputTileSize
-
The tile size to output.
Install GDAL
DESCRIPTION
This command installs the version of GDAL that is used by GeoWave. By default, it is installed to the GeoWave home directory under lib/utilities/gdal
. If an alternate directory is provided, it should be added to the PATH
environment variable for Mac and Windows users, or the LD_LIBRARY_PATH
environment variable for Linux users.
GeoServer Commands
Commands that manage GeoServer stores and layers.
Run GeoServer
Add Store
SYNOPSIS
geowave gs ds add [options] <data store name> geowave geoserver datastore add [options] <data store name>
Get Store
SYNOPSIS
geowave gs ds get [options] <store name> geowave geoserver datastore get [options] <store name>
Get Store Adapters
Remove Store
Add Coverage Store
SYNOPSIS
geowave gs cs add [options] <store name> geowave geoserver coveragestore add [options] <store name>
DESCRIPTION
This command adds a coverage store to the configured GeoServer instance. It requires that a GeoWave store has already been added.
OPTIONS
- -cs, --coverageStore <name>
-
The name of the coverage store to add.
- -histo, --equalizeHistogramOverride
-
This parameter will override the behavior to always perform histogram equalization if a histogram exists.
- -interp, --interpolationOverride <value>
-
This will override the default interpolation stored for each layer. Valid values are 0, 1, 2, 3 for NearestNeighbor, Bilinear, Bicubic, and Bicubic (polynomial variant) respectively.
- -scale, --scaleTo8Bit
-
By default, integer values will automatically be scaled to 8-bit and floating point values will not. This can be overridden setting this option.
- -ws, --workspace <workspace>
-
The GeoServer workspace to add the coverage store to.
Get Coverage Store
SYNOPSIS
geowave gs cs get [options] <coverage store name> geowave geoserver coveragestore get [options] <coverage store name>
Remove Coverage Store
Add Coverage
SYNOPSIS
geowave gs cv add [options] <coverage name> geowave geoserver coverage add [options] <coverage name>
Get Coverage
SYNOPSIS
geowave gs cv get [options] <coverage name> geowave geoserver coverage get [options] <coverage name>
DESCRIPTION
This command returns a information about a coverage from the configured GeoServer instance.
List Coverages
SYNOPSIS
geowave gs cv list [options] <coverage store name> geowave geoserver coverage list [options] <coverage store name>
Remove Coverage
SYNOPSIS
geowave gs cv rm [options] <coverage name> geowave geoserver coverage rm [options] <coverage name>
Add GeoWave Layer
SYNOPSIS
geowave gs layer add [options] <data store name> geowave geoserver layer add [options] <data store name>
DESCRIPTION
This command adds a layer from the given GeoWave data store to the configured GeoServer instance. Unlike gs fl add
, this command adds a layer directly from a GeoWave data store, automatically creating the GeoWave store for it in GeoServer.
OPTIONS
- -t, --typeName <type>
-
Add the type with the given name to GeoServer.
- -a, --add <layer type>
-
Add all layers of the given type to GeoServer. Possible values are
ALL
,RASTER
, andVECTOR
. - -sld, --setStyle <style>
-
The default style to use for the added layers.
- -ws, --workspace <workspace>
-
The GeoServer workspace to use.
EXAMPLES
Add a type called hail
from the example
data store to GeoServer:
geowave gs layer add -t hail example
Add all types from the example
data store to GeoServer:
geowave gs layer add --add ALL example
Add all vector types from the example
data store to GeoServer:
geowave gs layer add --add VECTOR example
Add Feature Layer
SYNOPSIS
geowave gs fl add [options] <layer name> geowave geoserver featurelayer add [options] <layer name>
DESCRIPTION
This command adds a feature layer from a GeoWave store to the configured GeoServer instance.
List Feature Layers
Add Style
Set Layer Style
SYNOPSIS
geowave gs style set [options] <layer name> geowave geoserver style set [options] <layer name>
Utility Commands
Miscellaneous operations that don’t really warrant their own top-level command. This includes commands to start standalone data stores and services.
Migration Command
Run Standalone Accumulo
NAME
geowave-util-accumulo-run - Runs a standalone mini Accumulo server for test and debug with GeoWave
Run Standalone Bigtable
NAME
geowave-util-bigtable-run - Runs a standalone Bigtable instance for test and debug with GeoWave
DESCRIPTION
This command runs a standalone Bigtable instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.
OPTIONS
- -d, --directory <path>
-
The directory to use for Bigtable. Default is
./target/temp
. - -i, --interactive <enabled>
-
Whether to prompt for user input to end the process. Default is
true
. - -p, --port <host>
-
The host and port the emulator will run on. Default is
127.0.0.1:8086
. - -s, --sdk <sdk>
-
The name of the Bigtable SDK. Default is
google-cloud-sdk-183.0.0-linux-x86_64.tar.gz
. - -u, --url <url>
-
The URL location to download Bigtable. Default is
https://dl.google.com/dl/cloudsdk/channels/rapid/downloads
.
Run Standalone Cassandra
NAME
geowave-util-cassandra-run - Runs a standalone Cassandra instance for test and debug with GeoWave
DESCRIPTION
This command runs a standalone Cassandra instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance. It will use the current working directory for its file store unless overridden by a yaml configuration.
Run Standalone DynamoDB
NAME
geowave-util-dynamodb-run - Runs a standalone DynamoDB instance for test and debug with GeoWave
DESCRIPTION
This command runs a standalone DynamoDB instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.
Run Standalone HBase
DESCRIPTION
This command runs a standalone HBase instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.
OPTIONS
- -a, --auth <authorizations>
-
A list of authorizations to grant the
admin
user. - -d, --dataDir <path>
-
Directory for HBase server-side data. Default is
./lib/services/third-party/embedded-hbase/data
. - -i, --interactive
-
If specified, prompt for user input to end the process.
- -l, --libDir <path>
-
Directory for HBase server-side libraries. Default is
./lib/services/third-party/embedded-hbase/lib
. - -r, --regionServers <count>
-
The number of region server processes. Default is 1.
- -z, --zkDataDir <path>
-
The data directory for the Zookeper instance. Default is
./lib/services/third-party/embedded-hbase/zookeeper
.
Run Standalone Kudu
DESCRIPTION
This command runs a standalone Kudu instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.
Run Standalone Redis
DESCRIPTION
This command runs a standalone Redis instance, which can be used locally for testing and debugging GeoWave, without needing to set up a full instance.
OPTIONS
- -d, --directory <path>
-
The directory to use for Redis. If set, the data will be persisted and durable. If none, it will use a temp directory and delete when complete
- -i, --interactive <enabled>
-
Whether to prompt for user input to end the process. Default is
true
. - -m, --maxMemory <size>
-
The maximum memory to use (in a form such as
512M
or1G
). Default is1G
. - -p, --port <port>
-
The port for Redis to listen on. Default is 6379.
- -s, --setting <setting>
-
A setting to apply to Redis in the form of
<name>=<value>
.
Run Standalone
NAME
geowave-util-accumulo-run - Runs a standalone mini Accumulo server for test and debug with GeoWave
Pre-split Partition IDs
NAME
geowave-util-accumulo-presplitpartitionid - Pre-split Accumulo table by providing the number of partition IDs
Split Equal Interval
NAME
geowave-util-accumulo-splitequalinterval - Set Accumulo splits by providing the number of partitions based on an equal interval strategy
DESCRIPTION
This command will allow a user to set the accumulated splits through providing the number of partitions based on an equal interval strategy.
Split by Number of Records
NAME
geowave-util-accumulo-splitnumrecords - Set Accumulo splits by providing the number of entries per split
DESCRIPTION
This command sets the Accumulo data store splits by providing the number of entries per split.
Split Quantile Distribution
NAME
geowave-util-accumulo-splitquantile - Set Accumulo splits by providing the number of partitions based on a quantile distribution strategy
DESCRIPTION
This command allows a user to set the Accumulo data store splits by providing the number of partitions based on a quantile distribution strategy.
OSM Commands
Operations to ingest Open Street Map (OSM) nodes, ways and relations to GeoWave.
OSM commands are not included in GeoWave by default. |
Import OSM
SYNOPSIS
geowave util osm ingest [options] <hdfs host:port> <path to base directory to read from> <store name>
OPTIONS
- -jn, --jobName
-
Name of mapreduce job. Default is
Ingest (mcarrier)
. - -m, --mappingFile
-
Mapping file, imposm3 form.
- --table
-
OSM Table name in GeoWave. Default is
OSM
. - * -t, --type
-
Mapper type - one of node, way, or relation.
- -v, --visibility <visibility>
-
The global visibility of the data ingested (optional; if not specified, the data will be unrestricted)
- -fv, --fieldVisibility <visibility>
-
Specify the visibility of a specific field in the format
<fieldName>:<visibility>
. This option can be specified multiple times for different fields. - -va, --visibilityAttribute <field>
-
Specify a field that contains visibility information for the whole row. If specified, any field visibilities defined by
-fv
will be ignored. - --jsonVisibilityAttribute
-
If specified, the value of the visibility field defined by
-va
will be treated as a JSON object with keys that represent fields and values that represent their visibility.
Stage OSM
Landsat8 Commands
Operations to analyze, download, and ingest Landsat 8 imagery publicly available on AWS.
Analyze Landsat 8
NAME
geowave-util-landsat-analyze - Print out basic aggregate statistics for available Landsat 8 imagery
DESCRIPTION
This command prints out basic aggregate statistics that are for available Landsat 8 imagery.
OPTIONS
- --cql <filter>
-
An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes:
shape
(Geometry),acquisitionDate
(Date),cloudCover
(double),processingLevel
(String),path
(int),row
(int) and the feature ID isproductId
for the scene. Additionally attributes of the individuals band can be used such asband
(String),sizeMB
(double), andbandDownloadUrl
(String). - --nbestbands <count>
-
An option to identify and only use a set number of bands with the best cloud cover.
- --nbestperspatial
-
A flag that when applied with
--nbestscenes
or--nbestbands
will aggregate scenes and/or bands by path/row. - --nbestscenes <count>
-
An option to identify and only use a set number of scenes with the best cloud cover.
- --sincelastrun
-
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.
- --usecachedscenes
-
If specified, run against the existing scenes catalog in the workspace directory if it exists.
- -ws, --workspaceDir <path>
-
A local directory to write temporary files needed for landsat 8 ingest. Default is
landsat8
.
Download Landsat 8
OPTIONS
- --cql <filter>
-
An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes:
shape
(Geometry),acquisitionDate
(Date),cloudCover
(double),processingLevel
(String),path
(int),row
(int) and the feature ID isproductId
for the scene. Additionally attributes of the individuals band can be used such asband
(String),sizeMB
(double), andbandDownloadUrl
(String). - --nbestbands <count>
-
An option to identify and only use a set number of bands with the best cloud cover.
- --nbestperspatial
-
A flag that when applied with
--nbestscenes
or--nbestbands
will aggregate scenes and/or bands by path/row. - --nbestscenes <count>
-
An option to identify and only use a set number of scenes with the best cloud cover.
- --sincelastrun
-
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.
- --usecachedscenes
-
If specified, run against the existing scenes catalog in the workspace directory if it exists.
- -ws, --workspaceDir <path>
-
A local directory to write temporary files needed for landsat 8 ingest. Default is
landsat8
.
Ingest Landsat 8
DESCRIPTION
This command downloads Landsat 8 imagery and then ingests it as raster data into GeoWave. At the same time, it ingests the scene metadata as vector data. The raster and vector data can be ingested into two separate data stores, if desired.
OPTIONS
- --converter <converter>
-
Prior to ingesting an image, this converter will be used to massage the data. The default is not to convert the data.
- --coverage <name>
-
The name to give to each unique coverage. Freemarker templating can be used for variable substition based on the same attributes used for filtering. The default coverage name is
${productId}_${band}
. If${band}
is unused in the coverage name, all bands will be merged together into the same coverage. - --cql <filter>
-
An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes:
shape
(Geometry),acquisitionDate
(Date),cloudCover
(double),processingLevel
(String),path
(int),row
(int) and the feature ID isproductId
for the scene. Additionally attributes of the individuals band can be used such asband
(String),sizeMB
(double), andbandDownloadUrl
(String). - --crop
-
If specified, use the spatial constraint provided in CQL to crop the image. If no spatial constraint is provided, this will not have an effect.
- --histogram
-
If specified, store the histogram of the values of the coverage so that histogram equalization will be performed.
- --nbestbands <count>
-
An option to identify and only use a set number of bands with the best cloud cover.
- --nbestperspatial
-
A flag that when applied with
--nbestscenes
or--nbestbands
will aggregate scenes and/or bands by path/row. - --nbestscenes <count>
-
An option to identify and only use a set number of scenes with the best cloud cover.
- --overwrite
-
If specified, overwrite images that are ingested in the local workspace directory. By default it will keep an existing image rather than downloading it again.
- --pyramid
-
If specified, store an image pyramid for the coverage.
- --retainimages
-
If specified, keep the images that are ingested in the local workspace directory. By default it will delete the local file after it is ingested successfully.
- --sincelastrun
-
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.
- --skipMerge
-
By default the ingest will automerge overlapping tiles as a post-processing optimization step for efficient retrieval, but this option will skip the merge process.
- --subsample <factor>
-
Subsample the image prior to ingest by the scale factor provided. The scale factor should be an integer value greater than or equal to 1. Default is 1.
- --tilesize <size>
-
The pixel size for each tile stored in GeoWave. Default is 256.
- --usecachedscenes
-
If specified, run against the existing scenes catalog in the workspace directory if it exists.
- --vectorindex <index>
-
By ingesting as both vectors and rasters you may want each indexed differently. This will override the index used for vector output.
- --vectorstore <store name>
-
By ingesting as both vectors and rasters you may want to ingest vector data into a different data store. This will override the data store for vector output.
- -ws, --workspaceDir <path>
-
A local directory to write temporary files needed for landsat 8 ingest. Default is
landsat8
.
EXAMPLES
Ingest and crop the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin, Germany, and output raster data to a landsatraster
data store and vector data to a landsatvector
data store:
geowave util landsat ingest --nbestperspatial --nbestscenes 1 --usecachedscenes --cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" --crop --retainimages -ws ./landsat --vectorstore landsatvector --pyramid --coverage berlin_mosaic landsatraster spatial-idx
Ingest Landsat 8 Raster
DESCRIPTION
This command downloads Landsat 8 imagery and then ingests it as raster data into GeoWave.
OPTIONS
- --converter <converter>
-
Prior to ingesting an image, this converter will be used to massage the data. The default is not to convert the data.
- --coverage <name>
-
The name to give to each unique coverage. Freemarker templating can be used for variable substitution based on the same attributes used for filtering. The default coverage name is
${productId}_${band}
. If${band}
is unused in the coverage name, all bands will be merged together into the same coverage. - --cql <filter>
-
An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes:
shape
(Geometry),acquisitionDate
(Date),cloudCover
(double),processingLevel
(String),path
(int),row
(int) and the feature ID isproductId
for the scene. Additionally attributes of the individuals band can be used such asband
(String),sizeMB
(double), andbandDownloadUrl
(String). - --crop
-
If specified, use the spatial constraint provided in CQL to crop the image. If no spatial constraint is provided, this will not have an effect.
- --histogram
-
If specified, store the histogram of the values of the coverage so that histogram equalization will be performed.
- --nbestbands <count>
-
An option to identify and only use a set number of bands with the best cloud cover.
- --nbestperspatial
-
A flag that when applied with
--nbestscenes
or--nbestbands
will aggregate scenes and/or bands by path/row. - --nbestscenes <count>
-
An option to identify and only use a set number of scenes with the best cloud cover.
- --overwrite
-
If specified, overwrite images that are ingested in the local workspace directory. By default it will keep an existing image rather than downloading it again.
- --pyramid
-
If specified, store an image pyramid for the coverage.
- --retainimages
-
If specified, keep the images that are ingested in the local workspace directory. By default it will delete the local file after it is ingested successfully.
- --sincelastrun
-
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.
- --skipMerge
-
By default the ingest will automerge overlapping tiles as a post-processing optimization step for efficient retrieval, but this option will skip the merge process.
- --subsample <factor>
-
Subsample the image prior to ingest by the scale factor provided. The scale factor should be an integer value greater than or equal to 1. Default is 1.
- --tilesize <size>
-
The pixel size for each tile stored in GeoWave. Default is 512.
- --usecachedscenes
-
If specified, run against the existing scenes catalog in the workspace directory if it exists.
- -ws, --workspaceDir <path>
-
A local directory to write temporary files needed for landsat 8 ingest. Default is
landsat8
.
EXAMPLES
Ingest and crop the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin, Germany, and output raster data to a landsatraster
data store:
geowave util landsat ingestraster --nbestperspatial --nbestscenes 1 --usecachedscenes --cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" --crop --retainimages -ws ./landsat --pyramid --coverage berlin_mosaic landsatraster spatial-idx
Ingest Landsat 8 Metadata
OPTIONS
- --cql <filter>
-
An optional CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes:
shape
(Geometry),acquisitionDate
(Date),cloudCover
(double),processingLevel
(String),path
(int),row
(int) and the feature ID isproductId
for the scene. Additionally attributes of the individuals band can be used such asband
(String),sizeMB
(double), andbandDownloadUrl
(String). - --nbestbands <count>
-
An option to identify and only use a set number of bands with the best cloud cover.
- --nbestperspatial
-
A flag that when applied with
--nbestscenes
or--nbestbands
will aggregate scenes and/or bands by path/row. - --nbestscenes <count>
-
An option to identify and only use a set number of scenes with the best cloud cover.
- --sincelastrun
-
If specified, check the scenes list from the workspace and if it exists, only ingest data since the last scene.
- --usecachedscenes
-
If specified, run against the existing scenes catalog in the workspace directory if it exists.
- -ws, --workspaceDir <path>
-
A local directory to write temporary files needed for landsat 8 ingest. Default is
landsat8
.
EXAMPLES
Ingest scene and band metadata of the B8 band of Landsat raster data over a bounding box that roughly surrounds Berlin, Germany to a landsatvector
data store:
geowave util landsat ingestvector --nbestperspatial --nbestscenes 1 --usecachedscenes --cql "BBOX(shape,13.0535,52.3303,13.7262,52.6675) AND band='B8' AND cloudCover>0" -ws ./landsat landsatvector spatial-idx
Start gRPC Server
DESCRIPTION
This command starts the GeoWave gRPC server on a given port number. Remote gRPC clients can interact with GeoWave from this service.