refactor(p2p): implement PeerConnections [part 2/11] #1174

glevco · 2024-11-06T15:06:29Z

Depends on #1173

Motivation

This PR is mostly a refactor for the P2P Multiprocess project, but it contains several general improvements and also a small bug fix.

Currently, the ConnectionsManager class keeps track of existing connections in their different states in multiple attributes with heterogeneous types. Connecting peers are identified by IStreamClientEndpoints, handshaking peers are identified by protocol instances themselves, ready peers are identified by PeerIds, and there's also a connections attribute that holds all of them as protocol instances too.

The main refactor of this PR is moving, renaming, and changing all those attributes. They're moved to the new PeerConnections class, that now holds all logic related to handling the connections as they advance in their state machine.

Now, every connection is uniquely identified by its connection address, which is present in both inbound and outbound connections (contrary to the entrypoint, which is removed). This is less error prone as its now impossible to connect to the same address twice (there was a small bug that briefly allowed bootstrap connections to the same address). Renames are also done to reduce confusion, specifically "ready peers" was called "connected peers" and is updated.

Here's the list of ConnectionsManager attributes that are moved to PeerConnections and updated to use PeerAddress:

connecting_peers: dict[IStreamClientEndpoint, _ConnectingPeer] → _connecting_outbound: set[PeerAddress].
handshaking_peers: set[HathorProtocol] → _handshaking: dict[PeerAddress, HathorProtocol].
connected_peers: dict[PeerId, HathorProtocol] → _ready: dict[PeerAddress, HathorProtocol].
connections: set[HathorProtocol] → removed, but equivalent methods are provided as get_connected_peers or similar.

Acceptance Criteria

Introduce new PeerConnections class, moving some connection logic from the ConnectionsManager to it.
Peer connections, both inbound and outbound, now use an addr: PeerAddress attribute to be uniquely identified. The entrypoint attribute is removed.
Remove use_ssl parameter from ConnectionsManager.connect_to and listen — this was unused, the instance attribute is used instead.
Remove "deferred" key from status API and update its address strings.
Add more tests.

Checklist

If you are requesting a merge into master, confirm this code is production-ready and can be included in future releases as soon as it gets merged

glevco · 2024-11-06T15:09:26Z

hathor/p2p/manager.py

        self.pubsub.publish(
            HathorEvents.NETWORK_PEER_CONNECTION_FAILED,
-            peer=peer,


Removing this for simplicity as it's not used.

glevco · 2024-11-06T15:10:00Z

hathor/p2p/manager.py

+        if endpoint.peer_id is not None and peer is not None:
+            assert endpoint.peer_id == peer.id, 'the entrypoint peer_id does not match the actual peer_id'


This is an assertion because the only caller that passes a peer constructs the endpoint using the peer.id. In another PR this may be further improved so this assertion can be removed.

glevco · 2024-11-06T15:10:18Z

hathor/p2p/manager.py

-    def on_peer_disconnect(self, protocol: HathorProtocol) -> None:
-        """Called when a peer disconnect."""
-        self.connections.discard(protocol)
-        if protocol in self.handshaking_peers:
-            self.handshaking_peers.remove(protocol)
-        if protocol._peer is not None:
-            existing_protocol = self.connected_peers.pop(protocol.peer.id, None)
-            if existing_protocol is None:
-                # in this case, the connection was closed before it got to READY state
-                return
-            if existing_protocol != protocol:
-                # this is the case we're closing a duplicate connection. We need to set the
-                # existing protocol object back to connected_peers, as that connection is still ongoing.
-                # A check for duplicate connections is done during PEER_ID state, but there's still a
-                # chance it can happen if both connections start at the same time and none of them has
-                # reached READY state while the other is on PEER_ID state
-                self.connected_peers[protocol.peer.id] = existing_protocol


This logic was simplified by calling a specific method for each state.

glevco · 2024-11-06T15:10:43Z

hathor/p2p/manager.py

-    def get_connection_to_drop(self, protocol: HathorProtocol) -> HathorProtocol:
-        """ When there are duplicate connections, determine which one should be dropped.
-
-        We keep the connection initiated by the peer with larger id. A simple (peer_id1 > peer_id2)
-        on the peer id string is used for this comparison.
-        """
-        assert protocol.peer is not None
-        assert protocol.peer.id is not None
-        assert protocol.my_peer.id is not None
-        other_connection = self.connected_peers[protocol.peer.id]
-        if bytes(protocol.my_peer.id) > bytes(protocol.peer.id):
-            # connection started by me is kept
-            if not protocol.inbound:
-                # other connection is dropped
-                return other_connection
-            else:
-                # this was started by peer, so drop it
-                return protocol
-        else:
-            # connection started by peer is kept
-            if not protocol.inbound:
-                return protocol
-            else:
-                return other_connection
-


This was moved to PeerConnections and simplified.

glevco · 2024-11-06T15:11:15Z

hathor/p2p/peer.py

-            else:
-                # When the peer is the server part of the connection we don't have the full entrypoint description
-                # So we can only validate the host from the protocol
-                assert protocol.transport is not None
-                connection_remote = protocol.transport.getPeer()
-                connection_host = getattr(connection_remote, 'host', None)
-                if connection_host is None:
-                    continue
-                # Connection host has only the IP
-                # So we must consider that the entrypoint could be in name format and we just validate the host
-                if connection_host == entrypoint.host:
-                    return True
-                test_mode = not_none(DifficultyAdjustmentAlgorithm.singleton).TEST_MODE
-                result = await discover_dns(entrypoint.host, test_mode)
-                if connection_host in [entrypoint.addr.host for entrypoint in result]:


I was able to remove this because before we only had entrypoint for outbound peers, but now we have addr for both inbound and outbound peers.

glevco · 2024-11-06T15:11:53Z

hathor/p2p/states/peer_id.py

-        if protocol.entrypoint is not None and protocol.entrypoint.peer_id is not None:
-            assert protocol.entrypoint.peer_id == peer.id
-


The protocol does not contain an entrypoint anymore, so this can be removed.

glevco · 2024-11-07T01:01:38Z

tests/p2p/test_protocol.py

+        if bytes(self.peer1.id) > bytes(self.peer2.id):
+            tr_dead = self.conn.tr1
+            tr_dead_value = self.conn.peek_tr1_value()
+            proto_alive = conn.proto2
            conn_alive = conn
-        elif conn.tr1.disconnecting or conn.tr2.disconnecting:
-            conn_dead = conn
-            conn_alive = self.conn
        else:
-            raise Exception('It should never happen.')
-        self._check_result_only_cmd(conn_dead.peek_tr1_value() + conn_dead.peek_tr2_value(), b'ERROR')
+            tr_dead = conn.tr2
+            tr_dead_value = conn.peek_tr2_value()
+            proto_alive = self.conn.proto1
+            conn_alive = self.conn
+
+        self._check_result_only_cmd(tr_dead_value, b'ERROR')
        # at this point, the connection must be closing as the error was detected on READY state
-        self.assertIn(True, [conn_dead.tr1.disconnecting, conn_dead.tr2.disconnecting])
-        # check connected_peers
-        connected_peers = list(self.manager1.connections.connected_peers.values())
-        self.assertEquals(1, len(connected_peers))
-        self.assertIn(connected_peers[0], [conn_alive.proto1, conn_alive.proto2])
+        self.assertTrue(tr_dead.disconnecting)
+        # check ready_peers
+        ready_peers = list(self.manager1.connections.iter_ready_connections())
+        self.assertEquals(1, len(ready_peers))
+        self.assertEquals(ready_peers[0], proto_alive)


Since I refactored the method that chooses which connection to drop, I updated this test to actually test the specific connection that is dropped, which is important.

Before, it would just test that some connection is dropped, but not which exact connection is dropped.

glevco · 2024-11-07T01:06:39Z

hathor/p2p/peer_connections.py

+    @staticmethod
+    def _should_drop_new_connection(new_conn: HathorProtocol) -> bool:
+        """
+        When there are connections with duplicate PeerIds, determine which one should be dropped, the old or the new.
+        Return True if we should drop the new connection, and False otherwise.
+
+        The logic to determine this is `(my_peer_id > other_peer_id) XNOR new_conn.inbound`.
+        """
+        my_peer_is_larger = bytes(new_conn.my_peer.id) > bytes(new_conn.peer.id)
+        return my_peer_is_larger == new_conn.inbound


I refactored this from ConnectionsManager, simplifying its logic to a simple boolean operation. The old docstring and comments were incorrect, because it stated a decision to drop either the connection initiated by us, or the one initiated by the other peer.

This was incorrect because it's possible that both duplicate connections are initiated by the other peer, for example. Therefore the correct description is just choosing whether to drop the old connection or the new, independent of who initiated them. The new_conn.inbound is used in the boolean logic keeping the exact same behavior as before, but it doesn't actually convey any meaning.

codecov · 2024-11-07T01:19:12Z

Codecov Report

Attention: Patch coverage is 88.70293% with 27 lines in your changes missing coverage. Please review.

Project coverage is 84.96%. Comparing base (4da0fb9) to head (8ed5b0d).

Files with missing lines	Patch %	Lines
hathor/p2p/manager.py	82.41%	10 Missing and 6 partials ⚠️
hathor/p2p/peer_connections.py	91.76%	6 Missing and 1 partial ⚠️
hathor/p2p/protocol.py	86.95%	2 Missing and 1 partial ⚠️
hathor/p2p/states/peer_id.py	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1174      +/-   ##
==========================================
+ Coverage   84.85%   84.96%   +0.10%     
==========================================
  Files         323      324       +1     
  Lines       25044    25101      +57     
  Branches     3846     3838       -8     
==========================================
+ Hits        21252    21327      +75     
  Misses       3046     3046              
+ Partials      746      728      -18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

glevco added the refactor label Nov 6, 2024

glevco self-assigned this Nov 6, 2024

glevco changed the title ~~refactor(p2p): implement PeerConnections~~ refactor(p2p): implement PeerConnections [part 2/10] Nov 6, 2024

glevco marked this pull request as ready for review November 6, 2024 15:06

glevco requested review from jansegre and msbrogli as code owners November 6, 2024 15:06

glevco commented Nov 6, 2024

View reviewed changes

glevco mentioned this pull request Nov 6, 2024

feat(reactor): add IReactorProcess and IReactorSocket support [part 3/11] #1163

Open

1 task

glevco force-pushed the refactor/p2p/peer-connections branch from 09ef898 to 497dac3 Compare November 7, 2024 00:58

glevco commented Nov 7, 2024

View reviewed changes

jansegre deleted the branch master November 7, 2024 14:41

jansegre closed this Nov 7, 2024

glevco reopened this Nov 7, 2024

glevco changed the base branch from refactor/p2p/peer-address-handling to master November 7, 2024 14:48

glevco changed the title ~~refactor(p2p): implement PeerConnections [part 2/10]~~ refactor(p2p): implement PeerConnections [part 2/11] Nov 7, 2024

glevco force-pushed the refactor/p2p/peer-connections branch from 497dac3 to eee9bb0 Compare November 8, 2024 15:44

glevco force-pushed the refactor/p2p/peer-connections branch from eee9bb0 to 8fbf968 Compare November 25, 2024 15:49

refactor(p2p): implement PeerConnections

8ed5b0d

glevco force-pushed the refactor/p2p/peer-connections branch from 8fbf968 to 8ed5b0d Compare November 25, 2024 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(p2p): implement PeerConnections [part 2/11] #1174

refactor(p2p): implement PeerConnections [part 2/11] #1174

glevco commented Nov 6, 2024

glevco Nov 6, 2024

glevco Nov 6, 2024

glevco Nov 6, 2024

glevco Nov 6, 2024

glevco Nov 6, 2024

glevco Nov 6, 2024

glevco Nov 7, 2024

glevco Nov 7, 2024 •

edited

Loading

codecov bot commented Nov 7, 2024 •

edited

Loading

		if endpoint.peer_id is not None and peer is not None:
		assert endpoint.peer_id == peer.id, 'the entrypoint peer_id does not match the actual peer_id'

		if protocol.entrypoint is not None and protocol.entrypoint.peer_id is not None:
		assert protocol.entrypoint.peer_id == peer.id

refactor(p2p): implement PeerConnections [part 2/11] #1174

Are you sure you want to change the base?

refactor(p2p): implement PeerConnections [part 2/11] #1174

Conversation

glevco commented Nov 6, 2024

Motivation

Acceptance Criteria

Checklist

glevco Nov 6, 2024

Choose a reason for hiding this comment

glevco Nov 6, 2024

Choose a reason for hiding this comment

glevco Nov 6, 2024

Choose a reason for hiding this comment

glevco Nov 6, 2024

Choose a reason for hiding this comment

glevco Nov 6, 2024

Choose a reason for hiding this comment

glevco Nov 6, 2024

Choose a reason for hiding this comment

glevco Nov 7, 2024

Choose a reason for hiding this comment

glevco Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Nov 7, 2024 • edited Loading

Codecov Report

glevco Nov 7, 2024 •

edited

Loading

codecov bot commented Nov 7, 2024 •

edited

Loading