Despite overall Elasticsearch stability, it is still possible
for a cluster to get into a "red"
state. One of the
reasons for that to happen is if an index becomes corrupt. This can
be caused by an abrupt loss of power, hardware failure or—more
commonly—running out of disk space. In this post we'll discuss how
to bring the cluster to a healthy state with minimal or no data
loss in such situation.
Problem setup
In our example scenario we have Elasticsearch cluster version
5.6 running on AWS. Steps described below will also work just fine
for versions 6.x, but probably not for 2.x or earlier. Deployment
on AWS was done with terraform using our open source
elasticsearch modules. At the same time, regardless of your
particular setup, Elasticsearch recovery steps will be very
similar, so keep reading on.
If there is something wrong with Elasticsearch, the first thing
to do is to check cluster health:
$ curl -s https://elasticsearch.example.com:9200/_cluster/health?pretty | jq '.status'
"red"
It is very easy to identify which indices are at fault, in case
when the cause for cluster degradation is in fact index corruption,
since those will have the status: "red"
themselves:
$ curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" | \
jq '.indices | map_values(.status)'
Obviously, after much googling with no success, the tempting way
to recover might be to just remove the folder with elasticsearch
data and start from scratch. But even in a development environment,
where data loss might be acceptable, that is a terrible solution.
It is most likely that only some of the indices are at fault, so
there is definitely a way to recover with far less damage.
The next section describes what to do if your Elasticsearch
cluster was deployed on AWS and the EBS volume with the data ran
out of space. On the other hand, if the file system has enough
space and something else caused the corruption you can skip to
Recover the indices.
Find some space
The first logical thing to do is to free up some space, such as
by:
- cleaning up some log/temp files
- checking if there is something other than Elasticsearch eating
up the hard drive, or
- simply resizing the EBS volume and then the file system on
it
We'll need to SSH into each data node, whether through a bastion
host, via a VPN, or by some other means. Just in case, if our
terraform modules where used to deploy Elasticsearch, here is how
to get a list of IP addresses for all data nodes in the
cluster:
$ aws ec2 describe-instances --filters \
'Name=tag:cluster,Values=elk-dev-elasticsearch-cluster' \
'Name=tag:Name,Values=*data-node*' \
'Name=instance-state-name,Values=running' \
| jq '.Reservations[].Instances[]
| { PublicIp: .PublicIpAddress, PrivateIp: .PrivateIpAddress }'
We need to log in to a data node and check its storage
situation. Assuming Elasticsearch stores data on the drive
/dev/xvdf
mounted at
/mnt/elasticsearch
:
$ df -h | grep /dev/xvdf
/dev/xvdf 7.8G 7.2G 276M 97% /mnt/elasticsearch
Although usage is still not at a full 100%, it is already
possible that the cluster is in a semi-functional state, and its
state is likely red. But if there is no more space, it is almost
certain that some indices are corrupt and API requests to store—or
even retrieve—data will result in an error.
$ lsblk | grep xvdf
xvdf 202:80 0 8G 0 disk /mnt/elasticsearch
$ df -h | grep /dev/xvdf
/dev/xvdf 7.8G 7.8G 0 100% /mnt/elasticsearch
In any case, we need to give it some space. The current EBS
volume size is 8Gb, as you might suspect. For the sake of the
example we will double it. The path for getting that done is
different depending on how the cluster was deployed. In our case it
was done with terraform, so
resizing EBS is just a matter of changing a variable and runnig the
usual terraform apply
.
Resizing EBS volumes does not change the file system, so we must
update it manually.
Important: The steps below will have to be done
on each of the data nodes:
- Check that the EBS size was updated successfully:
$ lsblk | grep xvdf
xvdf 202:80 0 16G 0 disk /mnt/elasticsearch
- And resize the file system
$ df -h | grep /dev/xvdf
/dev/xvdf 7.8G 7.8G 0 100% /mnt/elasticsearch
$ sudo resize2fs /dev/xvdf
resize2fs 1.42.13 (17-May-2015)
Filesystem at /dev/xvdf is mounted on /mnt/elasticsearch; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/xvdf is now 4194304 (4k) blocks long.
$ df -h | grep /dev/xvdf
/dev/xvdf 16G 7.8G 7.2G 52% /mnt/elasticsearch
Great, we've made some space. Beware that there is a limit
enforced by AWS on how many times you can resize an EBS volume per
day. Also, once you're done with recovery, you might want to
configure
curator to run recurring maintenance in order to prevent
running out of space again in the future.
Recover the indices
We've already seen how to identify the indices at fault. At this
point we could fix our problem by deleting the indices with red
status. But we can do better than that. In particular, indices with
red status most likely have their primary shards unassigned, so we
can try reassigning the shards and possibly deleting only the ones
that couldn't be recovered.
We can inspect the state of our shards with this API call:
$ curl -s https://elasticsearch.example.com:9200/_cat/shards?v
Note: If you only have one data node and are using
default "number_of_replicas": 1
, then for all indices
in yellow
state you will see 50% of your shards in a
state UNASSIGNED
, which is expected, since there is no
other available data node that could be responsible for the replica
shards. In order to fix that, you can change the number of replicas
to 0 or add at least one more data node to the cluster.
It will be easy to spot malfunctioning indices, since either all
or some of their primary shards will be UNASSIGNED
.
What we need to do is to tell Elasticsearch to try to reassign
failed shards. Those which do not change their state to
STARTED
after the attempt could be bad and can be
deleted.
Let's look at one of our red
indices as an
example:
$ curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" | \
jq '.indices."elk-2018.02.07"'
{
"status": "red",
"number_of_shards": 5,
"number_of_replicas": 1,
"active_primary_shards": 0,
"active_shards": 0,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 10
}
$ curl -s "https://elasticsearch.example.com:9200/_cat/shards?v" | grep elk-2018.02.07
elk-2018.02.07 1 p UNASSIGNED
elk-2018.02.07 1 r UNASSIGNED
elk-2018.02.07 2 p UNASSIGNED
elk-2018.02.07 2 r UNASSIGNED
elk-2018.02.07 3 p UNASSIGNED
elk-2018.02.07 3 r UNASSIGNED
elk-2018.02.07 4 p UNASSIGNED
elk-2018.02.07 4 r UNASSIGNED
elk-2018.02.07 0 p UNASSIGNED
elk-2018.02.07 0 r UNASSIGNED
Here we can see that all of the shards for the index
elk-2018.02.07
are UNASSIGNED
. It is
possible that some of them will be in STARTED
state,
but unless all of the primary shards (p
) are started,
the whole index and the cluster will be red
.
Furthermore we can inspect the exact reason why our shards for
the above index are UNASSIGNED
:
$ curl -s "https://elasticsearch.example.com:9200/_cluster/state/routing_table" | jq '
.routing_table.indices
| .[] | .shards | .[] | .[]
| select(.index == "elk-2018.02.07")
| select(.unassigned_info.reason == "ALLOCATION_FAILED")
'
Filtered response will contain a list of all shards for index
elk-2018.02.07
, that couldn't be allocated with
explanation why that happened. In case of lack of space possible
reasons could be:
"shard failure, reason [lucene commit failed], failure
IOException[No space left on device]"
"failed to create shard, failure IOException[No space
left on device]"
In order to fix as much as possible, we issue a
retry_failed
failed command by making a
POST
request with an empty body:
$ curl -s -XPOST "https://elasticsearch.example.com:9200/_cluster/reroute?retry_failed" | jq '
.state.routing_table.indices
| .[] | .shards | .[] | .[]
| select(.unassigned_info.reason=="ALLOCATION_FAILED")
'
Check the cluster status after issuing the reroute call, and you
should see the number of unassigned_shards
go down.
Here is
documentation on rerouting that can be useful in understanding
what is actually going on.
Most likely, the above call will need to be issued multiple
times, until all of the failed shards are reassigned. After enough
tries you should get your cluster back to the "green"
status. It is possible though that—after enough shards are
reassigned—the state will get to the "yellow"
status,
and no matter how many further retry_failed
commands
you will issue, unassigned_shards
number will not go
down:
$ curl https://elasticsearch.example.com:9200/_cluster/health?pretty
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 21,
"active_shards" : 34,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 8,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 80.95238095238095
}
One way or another, we already successfully restored our cluster
and it is now fully functional, so we can let Elasticsearch try to
heal itself by restarting the service on all data nodes:
$ sudo service elasticsearch restart
Lose some data
If at this point some of your indices are still red, and there
are shards that are corrupt and can not be reassigned, it may be
time to send those leftover UNASSIGNED
shards into the
abyss.
There is no “delete shard” API call in Elasticsearch, but there
is a command to allocate an empty primary shard on a particular
data node, which is effectively the same thing, except that you
need to tell Elasticsearch which node the new shard should be
assigned to. Any arbitrary node can be chosen for that purpose,
since Elasticsearch will rebalance shards later anyways, so in this
example we'll use the elk-dev-data-node-00-us-east-1a
node. Be aware, it will result in data loss!
$ RESP=$(curl -s "https://elasticsearch.example.com:9200/_cluster/state/routing_table" | jq '
.routing_table.indices
| .[] | .shards | .[] | .[]
| select(.unassigned_info.reason == "ALLOCATION_FAILED")
')
$ REQ=$(echo "$RESP" | jq 'select (.primary)
| { allocate_empty_primary: {
index: .index,
shard: .shard,
node: "elk-dev-data-node-00-us-east-1a",
accept_data_loss: true
}
}' | jq --slurp '{commands: .}')
$ curl -s -XPOST "https://elasticsearch.example.com:9200/_cluster/reroute" -d "$REQ" -H 'Content-Type: application/json'
Overwriting bad shards is guaranteed to fix the problem with
"red"
indices, and it results in much less data loss
than deleting the whole index.
If you still want to just go ahead and delete all indices with
status:"red"
, here is a very dangerous
script that will do so, but use it as a very last resort,
unless you really don't care about the data:
$ RED_INDICES=$(curl -s "https://elasticsearch.example.com:9200/_cluster/health?level=indices" | \
jq -r '[.indices | to_entries[] | select(.value.status == "red") | .key] | join(",")')
$ curl -s -XDELETE "https://elasticsearch.example.com:9200/$RED_INDICES"
Conclusion
We've deployed Elasticsearch/Logstash/Kibana (ELK) stack on
numerous occasions and it proved itself as an amazing log
aggregation and analysis solution. It could just as well be used
for other purposes such as monitoring, structured data ingestion,
or simply storage for documents. Whatever your use case is, if
Elasticsearch is at its center, maintenance has to be thought out
properly and curator
is a must.
As it was mentioned before, Elasticsearch is pretty good at
staying healthy. Disasters do happen though, and everyone's
situation is very different, so if above guide didn't solve your
problem, hopefully at least it helped a bit with narrowing down the
necessary solution. Please, share your experience with us by
commenting in the form below. If you need help deploying ELK or you
are stuck trying to bring your Elasticsearch cluster back to life,
feel free to contact us.
If you like this post you may also like:
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.