FP Complete


Despite overall Elasticsearch stability, it is still possible for a cluster to get into a "red" state. One of the reasons for that to happen is if an index becomes corrupt. This can be caused by an abrupt loss of power, hardware failure or—more commonly—running out of disk space. In this post we’ll discuss how to bring the cluster to a healthy state with minimal or no data loss in such situation.

Problem setup

In our example scenario we have Elasticsearch cluster version 5.6 running on AWS. Steps described below will also work just fine for versions 6.x, but probably not for 2.x or earlier. Deployment on AWS was done with terraform using our open source elasticsearch modules. At the same time, regardless of your particular setup, Elasticsearch recovery steps will be very similar, so keep reading on.

If there is something wrong with Elasticsearch, the first thing to do is to check cluster health:

$ curl -s https://elasticsearch.example.com:9200/_cluster/health?pretty | jq '.status'
"red"

It is very easy to identify which indices are at fault, in case when the cause for cluster degradation is in fact index corruption, since those will have the status: "red" themselves:

$ curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" | 
    jq '.indices | map_values(.status)'

Obviously, after much googling with no success, the tempting way to recover might be to just remove the folder with elasticsearch data and start from scratch. But even in a development environment, where data loss might be acceptable, that is a terrible solution. It is most likely that only some of the indices are at fault, so there is definitely a way to recover with far less damage.

The next section describes what to do if your Elasticsearch cluster was deployed on AWS and the EBS volume with the data ran out of space. On the other hand, if the file system has enough space and something else caused the corruption you can skip to Recover the indices.

Find some space

The first logical thing to do is to free up some space, such as by:

We’ll need to SSH into each data node, whether through a bastion host, via a VPN, or by some other means. Just in case, if our terraform modules where used to deploy Elasticsearch, here is how to get a list of IP addresses for all data nodes in the cluster:

$ aws ec2 describe-instances --filters 
    'Name=tag:cluster,Values=elk-dev-elasticsearch-cluster' 
    'Name=tag:Name,Values=*data-node*' 
    'Name=instance-state-name,Values=running' 
    | jq '.Reservations[].Instances[]
          | { PublicIp: .PublicIpAddress, PrivateIp: .PrivateIpAddress }'

We need to log in to a data node and check its storage situation. Assuming Elasticsearch stores data on the drive /dev/xvdf mounted at /mnt/elasticsearch:

$ df -h | grep /dev/xvdf
/dev/xvdf       7.8G  7.2G  276M  97% /mnt/elasticsearch

Although usage is still not at a full 100%, it is already possible that the cluster is in a semi-functional state, and its state is likely red. But if there is no more space, it is almost certain that some indices are corrupt and API requests to store—or even retrieve—data will result in an error.

$ lsblk | grep xvdf
xvdf    202:80   0   8G  0 disk /mnt/elasticsearch
$ df -h | grep /dev/xvdf
/dev/xvdf       7.8G  7.8G     0 100% /mnt/elasticsearch

In any case, we need to give it some space. The current EBS volume size is 8Gb, as you might suspect. For the sake of the example we will double it. The path for getting that done is different depending on how the cluster was deployed. In our case it was done with terraform, so resizing EBS is just a matter of changing a variable and runnig the usual terraform apply.

Resizing EBS volumes does not change the file system, so we must update it manually.

Important: The steps below will have to be done on each of the data nodes:

$ lsblk | grep xvdf
xvdf    202:80   0  16G  0 disk /mnt/elasticsearch
$ df -h | grep /dev/xvdf
/dev/xvdf       7.8G  7.8G     0 100% /mnt/elasticsearch
$ sudo resize2fs /dev/xvdf
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/xvdf is mounted on /mnt/elasticsearch; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/xvdf is now 4194304 (4k) blocks long.
$ df -h | grep /dev/xvdf
/dev/xvdf        16G  7.8G  7.2G  52% /mnt/elasticsearch

Great, we’ve made some space. Beware that there is a limit enforced by AWS on how many times you can resize an EBS volume per day. Also, once you’re done with recovery, you might want to configure curator to run recurring maintenance in order to prevent running out of space again in the future.

Recover the indices

We’ve already seen how to identify the indices at fault. At this point we could fix our problem by deleting the indices with red status. But we can do better than that. In particular, indices with red status most likely have their primary shards unassigned, so we can try reassigning the shards and possibly deleting only the ones that couldn’t be recovered.

We can inspect the state of our shards with this API call:

$ curl -s https://elasticsearch.example.com:9200/_cat/shards?v

Note: If you only have one data node and are using default "number_of_replicas": 1, then for all indices in yellow state you will see 50% of your shards in a state UNASSIGNED, which is expected, since there is no other available data node that could be responsible for the replica shards. In order to fix that, you can change the number of replicas to 0 or add at least one more data node to the cluster.

It will be easy to spot malfunctioning indices, since either all or some of their primary shards will be UNASSIGNED. What we need to do is to tell Elasticsearch to try to reassign failed shards. Those which do not change their state to STARTED after the attempt could be bad and can be deleted.

Let’s look at one of our red indices as an example:

$ curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" | 
    jq '.indices."elk-2018.02.07"'
{
  "status": "red",
  "number_of_shards": 5,
  "number_of_replicas": 1,
  "active_primary_shards": 0,
  "active_shards": 0,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 10
}
$ curl -s "https://elasticsearch.example.com:9200/_cat/shards?v" | grep elk-2018.02.07
elk-2018.02.07            1     p      UNASSIGNED
elk-2018.02.07            1     r      UNASSIGNED
elk-2018.02.07            2     p      UNASSIGNED
elk-2018.02.07            2     r      UNASSIGNED
elk-2018.02.07            3     p      UNASSIGNED
elk-2018.02.07            3     r      UNASSIGNED
elk-2018.02.07            4     p      UNASSIGNED
elk-2018.02.07            4     r      UNASSIGNED
elk-2018.02.07            0     p      UNASSIGNED
elk-2018.02.07            0     r      UNASSIGNED

Here we can see that all of the shards for the index elk-2018.02.07 are UNASSIGNED. It is possible that some of them will be in STARTED state, but unless all of the primary shards (p) are started, the whole index and the cluster will be red.

Furthermore we can inspect the exact reason why our shards for the above index are UNASSIGNED:

$ curl -s "https://elasticsearch.example.com:9200/_cluster/state/routing_table" | jq '
    .routing_table.indices
    | .[] | .shards | .[] | .[]
    | select(.index == "elk-2018.02.07")
    | select(.unassigned_info.reason == "ALLOCATION_FAILED")
    '

Filtered response will contain a list of all shards for index elk-2018.02.07, that couldn’t be allocated with explanation why that happened. In case of lack of space possible reasons could be:

In order to fix as much as possible, we issue a retry_failed failed command by making a POST request with an empty body:

$ curl -s -XPOST "https://elasticsearch.example.com:9200/_cluster/reroute?retry_failed" | jq '
    .state.routing_table.indices
    | .[] | .shards | .[] | .[]
    | select(.unassigned_info.reason=="ALLOCATION_FAILED")
    '

Check the cluster status after issuing the reroute call, and you should see the number of unassigned_shards go down. Here is documentation on rerouting that can be useful in understanding what is actually going on.

Most likely, the above call will need to be issued multiple times, until all of the failed shards are reassigned. After enough tries you should get your cluster back to the "green" status. It is possible though that—after enough shards are reassigned—the state will get to the "yellow" status, and no matter how many further retry_failed commands you will issue, unassigned_shards number will not go down:

$ curl https://elasticsearch.example.com:9200/_cluster/health?pretty
{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 21,
  "active_shards" : 34,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 8,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 80.95238095238095
}

One way or another, we already successfully restored our cluster and it is now fully functional, so we can let Elasticsearch try to heal itself by restarting the service on all data nodes:

$ sudo service elasticsearch restart

ElasticSearch

Lose some data

If at this point some of your indices are still red, and there are shards that are corrupt and can not be reassigned, it may be time to send those leftover UNASSIGNED shards into the abyss.

There is no “delete shard” API call in Elasticsearch, but there is a command to allocate an empty primary shard on a particular data node, which is effectively the same thing, except that you need to tell Elasticsearch which node the new shard should be assigned to. Any arbitrary node can be chosen for that purpose, since Elasticsearch will rebalance shards later anyways, so in this example we’ll use the elk-dev-data-node-00-us-east-1a node. Be aware, it will result in data loss!

$ RESP=$(curl -s "https://elasticsearch.example.com:9200/_cluster/state/routing_table" | jq '
    .routing_table.indices
    | .[] | .shards | .[] | .[]
    | select(.unassigned_info.reason == "ALLOCATION_FAILED")
    ')
$ REQ=$(echo "$RESP" | jq 'select (.primary)
    | { allocate_empty_primary: {
           index: .index,
           shard: .shard,
           node: "elk-dev-data-node-00-us-east-1a",
           accept_data_loss: true
           }
    }' | jq --slurp '{commands: .}')
$ curl -s -XPOST "https://elasticsearch.example.com:9200/_cluster/reroute" -d "$REQ" -H 'Content-Type: application/json'

Overwriting bad shards is guaranteed to fix the problem with "red" indices, and it results in much less data loss than deleting the whole index.

If you still want to just go ahead and delete all indices with status:"red", here is a very dangerous script that will do so, but use it as a very last resort, unless you really don’t care about the data:

$ RED_INDICES=$(curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" | 
    jq -r '[.indices | to_entries[] | select(.value.status == "red") | .key] | join(",")')
$ curl -s -XDELETE "https://elasticsearch.example.com:9200/$RED_INDICES"

Conclusion

We’ve deployed Elasticsearch/Logstash/Kibana (ELK) stack on numerous occasions and it proved itself as an amazing log aggregation and analysis solution. It could just as well be used for other purposes such as monitoring, structured data ingestion, or simply storage for documents. Whatever your use case is, if Elasticsearch is at its center, maintenance has to be thought out properly and curator is a must.

As it was mentioned before, Elasticsearch is pretty good at staying healthy. Disasters do happen though, and everyone’s situation is very different, so if above guide didn’t solve your problem, hopefully at least it helped a bit with narrowing down the necessary solution. Please, share your experience with us by commenting in the form below. If you need help deploying ELK or you are stuck trying to bring your Elasticsearch cluster back to life, feel free to contact us.

If  you like this post you may also like:

Subscribe to our blog via email

Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Tagged