redis sentinel中IP绑定的坑

在测试sentinel的时候遇到个问题,我在github上面也看到有人提出类似的问题。
测试环境在CentOS7.3, firewalld已经停掉, 同时已经禁用selinux, redis版本是3.2.9, 测试采用一主两从的结构

Node Host IP ServerPort SentinelPort
Node 1 node59 172.16.60.59 6379 26379
Node 2 node60 172.16.60.60 6379 26379
Node 3 node61 172.16.60.61 6379 26379

其中172.16.60.59是主节点, info replication输出结果是
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.16.60.60,port=6379,state=online,offset=9496017,lag=0
slave1:ip=172.16.60.61,port=6379,state=online,offset=9496017,lag=0
master_repl_offset:9496017
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:9297179
repl_backlog_histlen:198839

接着node59配置sentinel.conf
port 26379
bind 127.0.0.1 172.16.60.59
dir "/data/redis/redis_6379"
logfile "/data/redis/redis_6379/logs/sentinel.log"
unixsocket "/tmp/sentinel_26379.socket"
daemonize yes
sentinel monitor redist 172.16.60.59 6379 2
sentinel down-after-milliseconds redist 2000
sentinel parallel-syncs redist 1
sentinel failover-timeout redist 10000
sentinel auth-pass redist z5gCkXvn9XHR93MEeZfkF2t9WHk1xwlmQH4GXJxOUBr0Ghe7YtDe5jJbBGHW8jEO

启动sentinel,可以看到master0:name=redist,status=ok,address=172.16.60.59:6379,slaves=2,sentinels=1

接着配置node60的sentinel, 只修改bind 127.0.0.1 172.16.60.60, 启动sentinel,执行info sentinel输出看到的结果是master0:name=redist,status=sdown,address=172.16.60.59:6379,slaves=0,sentinels=1;奇怪了,明明node59上的sentinel可以正常工作, 而node60的去无法正常加入集群;开始还以为是配置有问题,反复检查配置,把配置干掉重新搞还是一样;TDDL~~~;清醒下头脑,一步步排查:
(1) arping ip 检查连接IP的mac地址是正确, 网络连接没有问题;
(2) 使用redis-cli检查主从节点端口可以正常边通, 排除端口问题!
(3) 查看sentinel日志输出,只有+monitor master redist 172.16.60.59 6379 quorum 2,+sdown master redist 172.16.60.59 6379,并没有报错信息;
这些问题点都排除了,似乎没有什么问题;怎么办怎么办?把debug日志打开,看到不一样的日志输出:
4402:X 27 Jun 13:54:54.503 # Sentinel ID is 3f78b3beb8f303c4dc63609d880f442be919fe64
4402:X 27 Jun 13:54:54.503 # +monitor master redist 172.16.60.59 6379 quorum 2
4402:X 27 Jun 13:54:54.503 . -cmd-link-reconnection master redist 172.16.60.59 6379 #Invalid argument
4402:X 27 Jun 13:54:54.503 . -pubsub-link-reconnection master redist 172.16.60.59 6379 #Invalid argument
4402:X 27 Jun 13:54:55.575 . -cmd-link-reconnection master redist 172.16.60.59 6379 #Invalid argument
4402:X 27 Jun 13:54:55.575 . -pubsub-link-reconnection master redist 172.16.60.59 6379 #Invalid argument
4402:X 27 Jun 13:54:56.530 # +sdown master redist 172.16.60.59 6379
4402:X 27 Jun 13:54:56.585 . -cmd-link-reconnection master redist 172.16.60.59 6379 #Invalid argument
4402:X 27 Jun 13:54:56.585 . -pubsub-link-reconnection master redist 172.16.60.59 6379 #Invalid argument

Invalid argument???搞不明白, 我在sentinel的源码查找到-cmd-link-reconnection和-pubsub-link-reconnection相关的源码

    /* Commands connection. */
    if (link->cc == NULL) {
        link->cc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
        if (link->cc->err) {
            sentinelEvent(LL_DEBUG,"-cmd-link-reconnection",ri,"%@ #%s",
                link->cc->errstr);
            instanceLinkCloseConnection(link,link->cc);
        } else {
            link->pending_commands = 0;
            link->cc_conn_time = mstime();
            link->cc->data = link;
            redisAeAttach(server.el,link->cc);
            redisAsyncSetConnectCallback(link->cc,
                    sentinelLinkEstablishedCallback);
            redisAsyncSetDisconnectCallback(link->cc,
                    sentinelDisconnectCallback);
            sentinelSendAuthIfNeeded(ri,link->cc);
            sentinelSetClientName(ri,link->cc,"cmd");

            /* Send a PING ASAP when reconnecting. */
            sentinelSendPing(ri);
        }
    }
    /* Pub / Sub */
    if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && link->pc == NULL) {
        link->pc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
        if (link->pc->err) {
            sentinelEvent(LL_DEBUG,"-pubsub-link-reconnection",ri,"%@ #%s",
                link->pc->errstr);
            instanceLinkCloseConnection(link,link->pc);
        } else {
            int retval;

            link->pc_conn_time = mstime();
            link->pc->data = link;
            redisAeAttach(server.el,link->pc);
            redisAsyncSetConnectCallback(link->pc,
                    sentinelLinkEstablishedCallback);
            redisAsyncSetDisconnectCallback(link->pc,
                    sentinelDisconnectCallback);
            sentinelSendAuthIfNeeded(ri,link->pc);
            sentinelSetClientName(ri,link->pc,"pubsub");
            /* Now we subscribe to the Sentinels "Hello" channel. */
            retval = redisAsyncCommand(link->pc,
                sentinelReceiveHelloMessages, ri, "SUBSCRIBE %s",
                    SENTINEL_HELLO_CHANNEL);
            if (retval != C_OK) {
                /* If we can't subscribe, the Pub/Sub connection is useless
                 * and we can simply disconnect it and try again. */
                instanceLinkCloseConnection(link,link->pc);
                return;
            }
        }
    }

 通过这段源码找到个关键点NET_FIRST_BIND_ADDR, 心想这不是取bind的第一个参数吗!!!然后我在server.h里找到
440 /* Get the first bind addr or NULL */
441 #define NET_FIRST_BIND_ADDR (server.bindaddr_count ? server.bindaddr[0] : NULL)
终于找到你,还好我没放弃~~~
把私网ip放在前面,重启sentinel, 执行info sentinel输出可以看到status=ok, 问题解决了!

总结: 如果sentinel配置了bind参数,sentinel将获取第一个ip去检测主节点状态, 由于127.0.0.1是个回环地址,所以当bind第一个ip配置成127.0.0.1时无法连接其他机器的ip,所以配置时第一个ip不能配置为回环地址,建议redis相关的ip绑定都先写私网ip再写回环ip

评论

  1. 6年前
    2019-1-04 0:19:18

    折腾一晚 终于解决一样的问题;礼貌留言

  2. XXL
    2年前
    2023-5-31 12:45:55

    nice

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇