geowave icon logo cropped v2

GeoWave Quickstart Guide - Vector Demo

In the Vector Demo, we use GeoWave to ingest and run a Kernel Density Estimation on a large set of media/broadcast data provided by the GDELT Project.

Set-Up Environment Variables

Download the GeoWave environment script.

In this walkthrough, /mnt will be used as the install path, make sure you modify the command to match your install location

cd /mnt
sudo wget s3.amazonaws.com/geowave/latest/scripts/emr/quickstart/geowave-env.sh

This script defines a number of the variables that will be used in future commands, so we will source it here.

source /mnt/geowave-env.sh

Download GDELT Data

We will be using data from the GDELT Project in this guide. For more information about the GDELT Project please visit their website here.

Download whatever gdelt data matches $TIME_REGEX. The example is set to 201602 by sourcing the geowave-env.sh script. Please make sure that you have sourced the environment script before calling this command.

sudo mkdir $STAGING_DIR/gdelt;cd $STAGING_DIR/gdelt
sudo wget http://data.gdeltproject.org/events/md5sums
for file in `cat md5sums | cut -d' ' -f3 | grep "^${TIME_REGEX}"` ; \
do sudo wget http://data.gdeltproject.org/events/$file ; done
md5sum -c md5sums 2>&1 | grep "^${TIME_REGEX}"
cd $STAGING_DIR

After the data has been downloaded, we are ready to set up the store and index used to ingest the data.

Config and Ingest

  1. Add a GeoWave store

    1. If using the Standalone Installer

      1. Redis

        geowave store add gdelt -t redis --gwNamespace geowave.gdelt --address redis://127.0.0.1:6379
      2. RocksDB

        geowave store add gdelt -t rocksdb --gwNamespace geowave.gdelt
      3. HBase

        geowave store add gdelt --gwNamespace geowave.gdelt -t hbase --zookeeper localhost:2181
      4. Accumulo

        geowave store add gdelt --gwNamespace geowave.gdelt -t accumulo --zookeeper localhost:2181 --instance accumulo --user root --password secret
    2. If using EMR

      1. Accumulo

        geowave store add gdelt --gwNamespace geowave.gdelt \
        -t accumulo --zookeeper $HOSTNAME:2181 --instance accumulo \
        --user geowave --password geowave
      2. HBase

        geowave store add gdelt --gwNamespace geowave.gdelt \
        -t hbase --zookeeper $HOSTNAME:2181
      3. Cassandra

        geowave store add gdelt --gwNamespace geowave.gdelt \
        -t cassandra  --contactPoints localhost
  2. Add a spatial index

    1. If using the Standalone Installer

      geowave index add -t spatial gdelt gdelt-spatial
    2. If using EMR

      geowave index add gdelt spatial -t spatial --partitionStrategy round_robin \
      --numPartitions $NUM_PARTITIONS
  3. Ingest the data into geowave

    1. If using the Standalone Installer

      geowave ingest localtogw /mnt/gdelt gdelt gdelt-spatial -f gdelt --gdelt.cql "INTERSECTS(geometry,$GERMANY)"
    2. If using EMR

      geowave ingest localtogw $STAGING_DIR/gdelt gdelt gdelt-spatial -f gdelt \
      --gdelt.cql "BBOX(geometry,${WEST},${SOUTH},${EAST},${NORTH})"

The ingest should take about ~3-5 minutes. Once the ingest has started, you can monitor HBase status at the HBase web interface, or the Accumulo status at the Accumulo web interface. The ingest is complete when your terminal will accept user input. If you would like to view the results of the vector ingest, you can proceed to the GeoServer Integration steps and follow the steps only for the Vector Demo Layers.

Kernel Density Estimation (KDE)

Once the ingest has completed:

  1. Add another store for the kde.

    1. If using the Standalone Installer

      1. Redis

        geowave store add gdelt-kde --gwNamespace geowave.gdelt -t redis --address redis://127.0.0.1:6379
      2. RockDB

        geowave store add gdelt-kde --gwNamespace geowave.gdelt -t rocksdb
      3. HBase

        geowave store add gdelt-kde --gwNamespace geowave.gdelt -t hbase -z localhost:2181
      4. Accumulo

        geowave store add gdelt -t accumulo --zookeeper localhost:2181 --instance accumulo --user root --password secret
    2. If using EMR

      1. Accumulo

        geowave store add gdelt-kde --gwNamespace geowave.kde_gdelt \
        -t accumulo --zookeeper $HOSTNAME:2181 --instance accumulo --user geowave --password geowave
      2. HBase

        geowave store add gdelt-kde --gwNamespace geowave.kde_gdelt \
        -t hbase --zookeeper $HOSTNAME:2181
      3. Cassandra

        geowave store add gdelt-kde --gwNamespace geowave.kde_gdelt \
        -t cassandra  --contactPoints localhost
  2. Run the KDE analytic

    1. If using the Standalone Installer

      geowave analytic kdespark --featureType gdeltevent -m local[*] --minLevel 5 --maxLevel 26 --coverageName gdeltevent_kde gdelt gdelt-kde
    2. If using EMR

      geowave analytic kde --featureType gdeltevent --minLevel 5 \
      --maxLevel 26 --minSplits $NUM_PARTITIONS --maxSplits $NUM_PARTITIONS \
      --coverageName gdeltevent_kde --hdfsHostPort ${HOSTNAME}:${HDFS_PORT} \
      --jobSubmissionHostPort ${HOSTNAME}:${RESOURCE_MAN_PORT} --tileSize 1 gdelt gdelt-kde

The KDE can take 5-10 minutes to complete due to the size of the dataset. Once it starts, its progress will be displayed in the terminal. The HBase status can be monitored through the HBase web interface, or the Accumulo status at the Accumulo web interface.

Once the KDE has run its course successfully, you should be able to view the heatmap generated by it, as well as a map of all of the ingested data points. If you would like to do this before completing the Raster Demo, proceed to Integrate with Geoserver and then to the Interacting with the Cluster section. You will still be able to view the results for both demos after completing the Raster Demo.

Raster Demo

GeoServer Integration

Interacting with the cluster