Neo4j图数据库性能测试
Centos上Neo4j数据库配置与LDBC_SNB生成测试数据导入性能测试
1.neo4j图形数据库安装与部署
1.1 neo4j-4版本与jdk下载
neo4j-3版本对应JDK8;(neo4j-3版本官方已不支持)
neo4j-4版本对应JDK11;
neo4j-5版本对应JDK17;
neo4j官方网址:https://neo4j.com/
JDK官网下载地址:https://www.oracle.com/cn/java/technologies/downloads/#java11
版本不对应时启动报错显示:
1.2配置jdk11和neo4j环境变量
在/usr/local下创建java和neo4j文件夹
mkdir /usr/local/java
mkdir /usr/local/neo4j
解压上传的jdk文件夹到各自目录下
tar -zxvf jdk-11.0.18_linux-x64_bin.tar.gz ./java
tar -zxvf neo4j-community-4.4.18-unix.tar.gz ./neo4j
修改环境变量
vim /etc/profile
#Java环境变量配置
export JAVA_HOME="/usr/local/java/jdk-11.0.18"
export PATH="$JAVA_HOME/bin:$PATH"
export CLASSPATH=".:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar"
#neo4j环境变量配置
PATH="/usr/local/neo4j/neo4j-community-4.4.18/bin:$PATH"
更新环境变量文件,使修改立即生效
source /etc/profile
# 查看jdk配置是否正确
java -version
1.3 修改配置,本地访问neo4j界面
# 修改neo4j配置文件,让它可以从`远程访问控制台`
vim /usr/local/neo4j/neo4j-community-4.4.18/conf/neo4j.conf
neo4j.conf修改内容如下:
dbms.default_listen_address=0.0.0.0 #取消注释
由于本地虚拟机测试,这里直接关闭防火墙
systemctl stop firewalld
启动neo4j,默认登录用户:neo4j 登录密码:neo4j
neo4j start
2 ldbc_snb测试数据生成
ldbc_snb测试数据入库流程:
- ldbc_snb_datagen生成测试数据
- ldbc_snb_interactive_impls转换测试的数据为neo4j数据库的格式
- 将csv文件导入neo4j数据库
2.1 ldbc_snb_datagen生成测试数据
软件准备:
2.2 ldbc_snb_interactive_impls转换流程
ldbc_snb_interactive_impls脚本工具是ldbc_snb的交互式实现,用于测试数据格式转换和入库。
下载网址:https://github.com/ldbc/ldbc_snb_interactive_impls
#命令路径位置:
cd /usr/local/ldbc_snb/ldbc_snb_interactive_impls-1.0.0/cypher/scripts
务必要看懂主要脚本的具体内容!并不难,才能顺利执行下面的操作。
这里需要看下.sh命令内容,例如首先要执行的load-in-one-step.sh内容,执行流程如下:
2.3 运行参数设置修改.sh脚本
可以看出这些命令是为Docker容器的neo4j
适配的。不过修改一下,我们依然可以用。
2.3.1 补充vars.sh运行环境变量路径
2.3.2 执行脚本csv文件转换为neo4j入库格式
NEO4J_CONVERTED_CSV_DIR和NEO4J_VANILLA_CSV_DIR两个参数是脚本运行需要加载的,查看依据如下:
系统变量中添加两个文件夹的路径:
# 打开配置文件
vim /etc/profile
#添加Neo4j中测试数据文件夹路径变量
export NEO4J_CONVERTED_CSV_DIR="/usr/local/neo4j/converted_csv_dir"
export NEO4J_VANILLA_CSV_DIR="/usr/local/neo4j/vanilla_csv_dir"
#配置立即生效
source /etc/profile
执行load-in-one-step.sh 脚本,具体执行过程如下:
[root@bogon scripts]# ./load-in-one-step.sh
===============================================================================
Loading the Neo4j database
-------------------------------------------------------------------------------
NEO4J_CONTAINER_ROOT: /usr/local/neo4j/neo4j-community-4.4.18
NEO4J_CONTAINER_NAME: neo4j-community-4.4.18
NEO4J_DATA_DIR: /usr/local/neo4j/neo4j-community-4.4.18/data
NEO4J_ENV_VARS: /usr/local/neo4j/neo4j-community-4.4.18/bin
NEO4J_VERSION: 4.4.18
NEO4J_VANILLA_CSV_DIR (on the host machine):
/usr/local/neo4j/vanilla_csv_dir
NEO4J_CONVERTED_CSV_DIR (on the host machine):
/usr/local/neo4j/converted_csv_dir
NEO4J_DATA_DIR (on the host machine):
/usr/local/neo4j/neo4j-community-4.4.18/data
NEO4J_CSV_POSTFIX: _0_0.csv
===============================================================================
Starting preprocessing CSV files
static/organisation: id:ID(Organisation)|:LABEL|name:STRING|url:STRING
static/place: id:ID(Place)|name:STRING|url:STRING|:LABEL
static/tagclass: id:ID(TagClass)|name:STRING|url:STRING
static/tag: id:ID(Tag)|name:STRING|url:STRING
static/tagclass_isSubclassOf_tagclass: :START_ID(TagClass)|:END_ID(TagClass)
static/tag_hasType_tagclass: :START_ID(Tag)|:END_ID(TagClass)
static/organisation_isLocatedIn_place: :START_ID(Organisation)|:END_ID(Place)
static/place_isPartOf_place: :START_ID(Place)|:END_ID(Place)
dynamic/comment: id:ID(Comment)|creationDate:LONG|locationIP:STRING|browserUsed:STRING|content:STRING|length:INT
dynamic/forum: id:ID(Forum)|title:STRING|creationDate:LONG
dynamic/person: id:ID(Person)|firstName:STRING|lastName:STRING|gender:STRING|birthday:LONG|creationDate:LONG|locationIP:STRING|browserUsed:STRING|speaks:STRING[]|email:STRING[]
dynamic/post: id:ID(Post)|imageFile:STRING|creationDate:LONG|locationIP:STRING|browserUsed:STRING|language:STRING|content:STRING|length:INT
dynamic/comment_hasCreator_person: :START_ID(Comment)|:END_ID(Person)
dynamic/comment_isLocatedIn_place: :START_ID(Comment)|:END_ID(Place)
dynamic/comment_replyOf_comment: :START_ID(Comment)|:END_ID(Comment)
dynamic/comment_replyOf_post: :START_ID(Comment)|:END_ID(Post)
dynamic/forum_containerOf_post: :START_ID(Forum)|:END_ID(Post)
dynamic/forum_hasMember_person: :START_ID(Forum)|:END_ID(Person)|joinDate:LONG
dynamic/forum_hasModerator_person: :START_ID(Forum)|:END_ID(Person)
dynamic/forum_hasTag_tag: :START_ID(Forum)|:END_ID(Tag)
dynamic/person_hasInterest_tag: :START_ID(Person)|:END_ID(Tag)
dynamic/person_isLocatedIn_place: :START_ID(Person)|:END_ID(Place)
dynamic/person_knows_person: :START_ID(Person)|:END_ID(Person)|creationDate:LONG
dynamic/person_likes_comment: :START_ID(Person)|:END_ID(Comment)|creationDate:LONG
dynamic/person_likes_post: :START_ID(Person)|:END_ID(Post)|creationDate:LONG
dynamic/person_studyAt_organisation: :START_ID(Person)|:END_ID(Organisation)|classYear:INT
dynamic/person_workAt_organisation: :START_ID(Person)|:END_ID(Organisation)|workFrom:INT
dynamic/post_hasCreator_person: :START_ID(Post)|:END_ID(Person)
dynamic/comment_hasTag_tag: :START_ID(Comment)|:END_ID(Tag)
dynamic/post_hasTag_tag: :START_ID(Post)|:END_ID(Tag)
dynamic/post_isLocatedIn_place: :START_ID(Post)|:END_ID(Place)
Finished preprocessing CSV files #格式转换成功
#以下是docker容器的内容,不用管
==========================================================================2=
scripts/stop-neo4j.sh: line 11: docker: command not found
No container neo4j-community-4.4.18 found
==========================================================================3=
==========================================================================4=
scripts/import-to-neo4j.sh: line 18: docker: command not found
2.3.3 neo4j数据库导入CSV文件
查看import-to-neo4j.sh 脚本内容是Docker容器的neo4j导入数据操作。
为了能够使用,改写创建了import-to-neo4j-noDocker.sh指定好路径,neo4j-admin import命令导入数据。
编写具体内容如下:
[root@bogon scripts]# vim import-to-neo4j-noDocker.sh
${NEO4J_CONTAINER_ROOT}/bin/neo4j-admin import \
--id-type=INTEGER \
--nodes=Place="${NEO4J_CONTAINER_ROOT}/import/static/place${NEO4J_CSV_POSTFIX}" \
--nodes=Organisation="${NEO4J_CONTAINER_ROOT}/import/static/organisation${NEO4J_CSV_POSTFIX}" \
--nodes=TagClass="${NEO4J_CONTAINER_ROOT}/import/static/tagclass${NEO4J_CSV_POSTFIX}" \
--nodes=Tag="${NEO4J_CONTAINER_ROOT}/import/static/tag${NEO4J_CSV_POSTFIX}" \
--nodes=Comment:Message="${NEO4J_CONTAINER_ROOT}/import/dynamic/comment${NEO4J_CSV_POSTFIX}" \
--nodes=Forum="${NEO4J_CONTAINER_ROOT}/import/dynamic/forum${NEO4J_CSV_POSTFIX}" \
--nodes=Person="${NEO4J_CONTAINER_ROOT}/import/dynamic/person${NEO4J_CSV_POSTFIX}" \
--nodes=Post:Message="${NEO4J_CONTAINER_ROOT}/import/dynamic/post${NEO4J_CSV_POSTFIX}" \
--relationships=IS_PART_OF="${NEO4J_CONTAINER_ROOT}/import/static/place_isPartOf_place${NEO4J_CSV_POSTFIX}" \
--relationships=IS_SUBCLASS_OF="${NEO4J_CONTAINER_ROOT}/import/static/tagclass_isSubclassOf_tagclass${NEO4J_CSV_POSTFIX}" \
--relationships=IS_LOCATED_IN="${NEO4J_CONTAINER_ROOT}/import/static/organisation_isLocatedIn_place${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_TYPE="${NEO4J_CONTAINER_ROOT}/import/static/tag_hasType_tagclass${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_CREATOR="${NEO4J_CONTAINER_ROOT}/import/dynamic/comment_hasCreator_person${NEO4J_CSV_POSTFIX}" \
--relationships=IS_LOCATED_IN="${NEO4J_CONTAINER_ROOT}/import/dynamic/comment_isLocatedIn_place${NEO4J_CSV_POSTFIX}" \
--relationships=REPLY_OF="${NEO4J_CONTAINER_ROOT}/import/dynamic/comment_replyOf_comment${NEO4J_CSV_POSTFIX}" \
--relationships=REPLY_OF="${NEO4J_CONTAINER_ROOT}/import/dynamic/comment_replyOf_post${NEO4J_CSV_POSTFIX}" \
--relationships=CONTAINER_OF="${NEO4J_CONTAINER_ROOT}/import/dynamic/forum_containerOf_post${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_MEMBER="${NEO4J_CONTAINER_ROOT}/import/dynamic/forum_hasMember_person${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_MODERATOR="${NEO4J_CONTAINER_ROOT}/import/dynamic/forum_hasModerator_person${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_TAG="${NEO4J_CONTAINER_ROOT}/import/dynamic/forum_hasTag_tag${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_INTEREST="${NEO4J_CONTAINER_ROOT}/import/dynamic/person_hasInterest_tag${NEO4J_CSV_POSTFIX}" \
--relationships=IS_LOCATED_IN="${NEO4J_CONTAINER_ROOT}/import/dynamic/person_isLocatedIn_place${NEO4J_CSV_POSTFIX}" \
--relationships=KNOWS="${NEO4J_CONTAINER_ROOT}/import/dynamic/person_knows_person${NEO4J_CSV_POSTFIX}" \
--relationships=LIKES="${NEO4J_CONTAINER_ROOT}/import/dynamic/person_likes_comment${NEO4J_CSV_POSTFIX}" \
--relationships=LIKES="${NEO4J_CONTAINER_ROOT}/import/dynamic/person_likes_post${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_CREATOR="${NEO4J_CONTAINER_ROOT}/import/dynamic/post_hasCreator_person${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_TAG="${NEO4J_CONTAINER_ROOT}/import/dynamic/comment_hasTag_tag${NEO4J_CSV_POSTFIX}" \
--relationships=HAS_TAG="${NEO4J_CONTAINER_ROOT}/import/dynamic/post_hasTag_tag${NEO4J_CSV_POSTFIX}" \
--relationships=IS_LOCATED_IN="${NEO4J_CONTAINER_ROOT}/import/dynamic/post_isLocatedIn_place${NEO4J_CSV_POSTFIX}" \
--relationships=STUDY_AT="${NEO4J_CONTAINER_ROOT}/import/dynamic/person_studyAt_organisation${NEO4J_CSV_POSTFIX}" \
--relationships=WORK_AT="${NEO4J_CONTAINER_ROOT}/import/dynamic/person_workAt_organisation${NEO4J_CSV_POSTFIX}" \
--delimiter '|'
Type :quit<Enter> to exit Vim
import-to-neo4j-noDocker.sh不难看懂这个脚本的内容,其本质还是使用neo4j源码自带脚本neo4j-admin import批量导入文件。
将待入库的csv文件放置在neo4j源码的import文件夹下,这个是根据上面的脚本内容决定的。
csv文件数据库入库成功:
3. 测试neo4j图形数据库性能
neo4j官方语法文档:https://neo4j.com/docs/cypher-manual/current/clauses/
cypher查询操作演示:https://zhuanlan.zhihu.com/p/88745411
注意:以下参数中relation均为测试数据中关系边名称,number为关系节点名称,beginTime为关系边属性。
语句 1场景:筛选时间属性小于某个时间节点的数据
MATCH (src)-[e:relation]->(dst)
WHERE e.beginTime <= timestamp("2023-05-28T19:07:24")
RETURN e, src, dst
LIMIT 120;
语句 2场景:如何根据点 id 查看某个 tag 是否存在
具体语句:match (v:tagName) where id(v)==’2′ return v.tagName.prop1
适合版本:v3.2.0
语句出处:https://discuss.nebula-graph.com.cn/t/topic/11653 1
Match (v:number) where id(v)=='2' return v
语句 3场景:最长路径输出:若路径不可达,则返回已知最长可达路径(跑出来40w数据,查不出,服务会崩溃)
match p=(a)-[e1]-(b)-[e2]-(c) optional match (c)-[e3]-(d) return p;
语句 4场景:一跳返回这个点的所有连接到的点,然后这个连接到的点,进行一个分组限制每种类型只返回10个,然后每种类型的10个点进行二跳
MATCH (a)-[b]->(c) where id(a)=='1' WITH labels(c) AS ctype, collect(distinct c)[0..10] AS c_with_same_type
UNWIND c_with_same_type AS cc
OPTIONAL MATCH (cc)-[d]->(e)
RETURN cc, d, e
语句 5场景:筛选特定出入度的点
MATCH (v:number)-[:relation]->()
where size((v)-[:relation]->())>10
RETURN distinct v limit 1000;
语句 6场景:查询某个 vertex 为终点的路径(正常)
MATCH p=(v)-[*1..4]-(m)
where id(v)=="20" and all( midNode in [n in nodes(p) where id(n)<>id(v) | n] where "frontPort" not in tags(midNode))
RETURN p
语句 7场景:找寻环数据,如何查找出图数据库中会形成环的数据(查不出会服务会宕掉)
MATCH p=(v:number)-[:relation*1..10]->(v)
RETURN p
语句 8场景:查询没有边的点
MATCH (v:number)
WHERE not (v)--()
RETURN id(v)
语句 9场景:查询没有入边只有出边的点
MATCH (v:number<--() where not (v:number)-->() RETURN id(v)
语句 10场景:查询与节点相关的确定跳数之内的节点
MATCH p=(v)-[*1..2]-(m) where id(v)=="1" and all( number in [n in nodes(p) where id(n)<>id(v) | n] where "frontPort" not in tags(number)) RETURN p
语句 11 场景:查询与节点相关的未知跳数节点
MATCH p=(v)-[*]-(m) where id(v)=="1" and all( number in [n in nodes(p) where id(n)<>id(v) | n] where "frontPort" not in tags(number)) RETURN p
## 4. nebula图形数据库资源占用实时体现
docker部署另外一种nebula图形数据库时,主要使用命令docker stats,观察和限制各个容器,在查询特定场景时的,资源占用状态。
主要资源占用情况示例如下: