b9737ddcadbbe8092b27df4f6ab2e6e9f3cf4c72 | Author: Wojciech Przytuła <wojciech.przytula@scylladb.com>
| 2023-06-22 15:59:19+02:00
Reresolve DNS as fallback when all hosts are unreachable
If all nodes in the cluster change their IPs at one time, driver used to
no longer be able to ever contact the cluster; the only solution was to
restart the driver. A fallback is added to the control connection
`reconnect()` logic so that when no known host is reachable,
all hostnames provided in ClusterConfig (initial contact points)
are reresolved and control connection is attempted to be opened to any
of them. If this succeeds, a metadata fetch is issued normally
and the whole cluster is discovered with its new IPs.
For the cluster to correctly learn new IPs in case that nodes are
accessible indirectly (e.g. through a proxy), that is, by translated
address and not `rpc_address` or `broadcast_address`, the code
introduced in #1682 was extended to remove and re-add a host also when
its translated address changed (even when its internal address stays the
same).
As a bonus, a misnamed variable `hostport` is renamed to a suitable
`hostaddr`.
73398bd50d44354376363f80459b160107efe624 | Author: Wojciech Przytuła <wojciech.przytula@scylladb.com>
| 2023-05-26 13:56:24+02:00
conn: Advertise driver's name & version in STARTUP
Advertising driver's name in the system.clients table can be helpful
when debugging issues, e.g. when a connection imbalance occurs and
allows to narrow down the culprit application/driver better.
47d05f0d71e15504260490347ea3e329a494d1e7 | Author: Wojciech Przytuła <wojciech.przytula@scylladb.com>
| 2023-03-29 16:42:03+02:00
query_executor: Fixed race between Query.Release() and speculative executions
As described in #1315, there is a race condition when there is more than
one concurrent execution (by means of speculative execution) and
Query.Release() is called after one of them completes. Query.Release()
calls Query.reset(), which in turns zeroes the whole Query. Then, after
another execution completes, it tries to access metrics, but they are
already set to nil by the call to Release(), so a segfault is triggered.
The solution is quite simple: a ref counter is introduced into the Query.
It is obviously initially set to 1. Every execution fiber
(i.e. every execution goroutine) has the refcount atomically incremented
before it is started, and decrements the refcount on completion.
Query.Release() merely decrements the refcount. Query.reset() is called
by whichever goroutine that decrements the refcount to 0.