Yuto Doi all time


 1 Collaborator
Martin Sucha

 1 Patch
9a3953d1826df998f14cc6857738786d1691a326

9a3953d1826df998f14cc6857738786d1691a326 | Author: Martin Sucha <martin.sucha@kiwi.com>
 | 2021-07-06 16:01:56+02:00

    Do not reconnect control conn when closing session
    
    When closing the session, the following deadlock happens:
    
    1. Session.Close() - [try to close the
       session](https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/session.go#L450),
       this locks `Session.sessionStateMu`
    2. s.control.close() - [close the control
       connection](https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/session.go#L464)
    3. ch.conn.Close() - [close the control
       connection](https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/control.go#L493)
    4. c.closeWithError(nil) - [close the connection without
       error](https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/conn.go#L555)
    5. c.close() - [close the
       connection](https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/conn.go#L540)
       returns an error e.g. `write tcp 172.XX.XXX.X:41228->XX.XXX.XX.XXX:9142: i/o timeout`
    6. c.errorHandler.HandleError(c, cerr, true) - [call error handler with returned
       error](https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/conn.go#L546)
    7. c.reconnect(false) - HandleError method doesn't check if the
       connection is supposed to be closed or not, so that it tries to
       reconnect again -
       https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/control.go#L406
    8. c.setupConn(newConn) - [try to set up a new
       connection](https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/control.go#L382)
    9. c.session.initialized() - [this tries to lock Session.sessionStateMu
       again](https://github.com/gocql/gocql/blob/769848eae4625444c6abdabc4a67eacb117c9200/control.go#L288)
       leading to a deadlock.
    
    We don't want to reconnect during Session.close because of other
    triggers (like heartbeats), so adding the check to c.reconnect.
    This will prevent the deadlock from occurring.
    
    We might add separate mutexes for session initialized/closed state
    in a separate commit, that would be sufficient to remove the deadlock
    as well.
    
    Co-Authored-By: Yuto Doi <yutodoi.seattle@gmail.com>