说在前面的话:
将索引建立在elasticsearch这样一个分布式的全文搜索引擎里面,再将数据存储在hbase中这样的需求是非常常见的,也是非常重要的,比如在旅游网站,视频网站,58同城这样的网站中都是非常需要这样的技术
elasticsearch需要hdfs和zookeeper环境基础
开始搭建:elasticsearch安装搭建
yml文件要顶格,封号后面的空格要保留环境准备
centos,vmware,4台虚拟机
hadoop1,hadoop2,hadoop3,hadoop4所有节点安装elasticsearch-2.2.0.tar.gz
第一步:解压elasticsearch-2.2.0.tar.gz
tar -zvxf elasticsearch-2.2.0.tar.gz
移动到home目录下
mv elasticsearch-2.2.0 /home/
切换目录
cd /home/elasticsearch-2.2.0
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
# 注意空格,要顶格
cluster.name: chenkl
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
node.name: node-1
#
# Add custom attributes to the node:
#
# node.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
# path.data: /path/to/data
#
# Path to log files:
#
# path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
# bootstrap.mlockall: true
#
# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory
# available on the system and that the owner of the process is allowed to use this limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 192.168.25.151
#
# Set a custom port for HTTP:
#
# http.port: 9200
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
# discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
#
# bootstrap.mlockall: true
#
# Make sure that the `ES_HEAP_SIZE` environment variable is set to about half the memory
# available on the system and that the owner of the process is allowed to use this limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
network.host: 192.168.149.130
#
# Set a custom port for HTTP:
#
# http.port: 9200
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html>
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
# discovery.zen.ping.unicast.hosts: ["host1", "host2"]
#
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
# discovery.zen.minimum_master_nodes: 3
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
# gateway.recover_after_nodes: 3
#
# For more information, see the documentation at:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/modules-gateway.html>
#
# ---------------------------------- Various -----------------------------------
#
# Disable starting multiple nodes on a single system:
#
# node.max_local_storage_nodes: 1
#
# Require explicit names when deleting indices:
#
# action.destructive_requires_name: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping_timeout: 120s
client.transport.ping_timeout: 60s
discovery.zen.ping.unicast.hosts: ["192.168.25.151","192.168.25.152","192.168.25.153","192.168.25.154"]
复制到其他机器上
更改相应的配置
node.name: node-1
network.host: 192.168.25.151
所有节点创建用户
Ctrl+c可以停止上面开启的服务
到此elasticsearch安装启动完毕
192.168.25.151:9200/_cluster/health
查看集群状态
一. 安装Kibana
kibana-4.4.1-linux-x64.tar.gz
随便找一台虚拟机hadoop1, 解压安装,修改配置文件vi config/kibana.yml的elasticsearch.url属性即可
注意配置yml结尾的配置文件都需要冒号后面加空格才行
elasticsearch安装插件,所有节点
单独的装有kibana的节点安装
[
[email protected] kibana-4.4.1-linux-x64]#
bin/kibana plugin --install elasticsearch/marvel/latest
marvel-2.2.1能和kibana-4.4.1完美结合,指定插件版本安装
[[email protected] kibana-4.4.1-linux-x64]# bin/kibana plugin --install elasticsearch/marvel/2.2.1
卸载或者删除插件命令
[[email protected] kibana-4.4.1-linux-x64]# bin/kibana plugin -r marvel
接下来启动elasticsearch和kibana
浏览器输入:http://192.168.25.151:9200 访问
浏览器输入:http://192.168.25.151:5601 访问



elasticsearch集成中文分词器(已经编译过的),所有节点
ik分词器和es的版本对应关系,以及如何手动编译ik分词器到es中使用
https://github.com/medcl/elasticsearch-analysis-ik/tree/v6.2.4
elasticsearch-analysis-ik-1.8.0.zip放入这个ik目录中
要有这么几个文件
common-codec-1.9.jar
common-logging-1.2.jar
config
elastcsearch-analysis-ik-1.8.0.jar
httpclient-4.4.1.jar
httpcore-4.4.1.jar
plugin-descriptor.properties
复制到其他节点上
[[email protected] ik]# scp -r ./ [email protected]:/home/elasticsearch-2.2.0/plugins/ik/
[[email protected] ik]# scp -r ./ [email protected]:/home/elasticsearch-2.2.0/plugins/ik/
[[email protected] ik]# scp -r ./ [email protected]:/home/elasticsearch-2.2.0/plugins/ik/
利用ik分词器设置搜索域 创建索引库
cd /home/
vi dkjhl.json
{
"settings":{
"number_of_shards":5,
"number_of_replicas":0
},
"mappings":{
"doc":{
"dynamic":"strict",
"properties":{
"id":{"type":"integer","store":"yes"},
"title":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},
"describe":{"type":"string","store":"yes","index":"analyzed","analyzer": "ik_max_word","search_analyzer": "ik_max_word"},
"author":{"type":"string","store":"yes","index":"no"}
}
}
}
}
删除索引库
[[email protected] home]# curl -XDELETE 'hadoop1:9200/dkjhl'
创建索引库
[[email protected] home]# curl -XPOST 'hadoop1:9200/dkjhl' -d @dkjhl.json
下次启动elasticsearch
所有节点
cd /home/elasticsearch
su bigdata
$ bin/elasticsearch
hadoop1节点
bin/kibana