Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange freezing behavior of unidirectional Windows named pipes on single-threaded runtime #5170

Closed
kotauskas opened this issue Nov 5, 2022 · 7 comments · Fixed by #5208
Closed
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-net Module: tokio/net

Comments

@kotauskas
Copy link

Version
Tested on latest 1.21.2 and also on 1.8.0

Platform
64-bit Windows 10, version 10.0.19043 N/A Build 19043

Description

The issue occurs with the following snippet (I believe this is the best MCVE I can come up with as I have no ideas on how to reduce the size of this further):

use {
    std::{io, time::Duration},
    tokio::{
        io::{AsyncBufReadExt, AsyncWriteExt, BufReader},
        net::windows::named_pipe::{ClientOptions, NamedPipeServer, ServerOptions},
        sync::oneshot::{channel, Sender},
        task,
        time::sleep,
    },
};

static PATH: &str = r"\\.\pipe\Tokio inbound bug MCVE";
static MSG: &str = "Hello from server!\n";
const NUM_CLIENTS: usize = 16;
const INBOUND: bool = false;

pub async fn server(snd: Sender<()>) -> io::Result<()> {
    async fn handle_conn(mut conn: NamedPipeServer) -> io::Result<()> {
        conn.write_all(MSG.as_bytes()).await?;
        drop(conn);

        Ok(())
    }

    let mut srv = ServerOptions::new()
        .access_inbound(INBOUND)
        .access_outbound(true)
        .first_pipe_instance(true)
        .create(PATH)?;

    let _ = snd.send(());

    let mut tasks = Vec::with_capacity(NUM_CLIENTS);

    for _ in 0..NUM_CLIENTS {
        srv.connect().await?;
        let new_srv = ServerOptions::new()
            .access_inbound(INBOUND)
            .access_outbound(true)
            .create(PATH)?;
        let old_srv = std::mem::replace(&mut srv, new_srv);
        let task = task::spawn(handle_conn(old_srv));
        tasks.push(task);
    }
    for task in tasks {
        task.await??;
    }

    Ok(())
}
pub async fn client() -> io::Result<()> {
    let mut buffer = String::with_capacity(128);

    let mut conn = loop {
        match ClientOptions::new().write(false).open(PATH) {
            Err(e) if e.raw_os_error() == Some(winapi::shared::winerror::ERROR_PIPE_BUSY as _) => {
                sleep(Duration::from_millis(10)).await;
                continue;
            }
            not_busy => break not_busy,
        }
    }
    .map(BufReader::new)?;

    conn.read_line(&mut buffer).await?;

    assert_eq!(buffer, MSG);

    Ok(())
}

#[tokio::main]
async fn main() {
    let (tx, rx) = channel();
    let srv = task::spawn(server(tx));
    let mut tasks = Vec::with_capacity(NUM_CLIENTS as _);
    let _ = rx.await;
    for _ in 0..NUM_CLIENTS {
        tasks.push(task::spawn(client()));
    }
    for task in tasks {
        task.await.unwrap().unwrap();
    }
    srv.await.unwrap().unwrap();
}

On a single-threaded runtime, the program freezes, as if having encountered a deadlock. (This state is actually a busy loop where the process is at 100% utilization of a single core.) If the INBOUND constant is set to true, the snippet runs fine.

On a multi-threaded runtime, the program runs correctly in both situations (timely termination with no output).

I assume that the freezing isn't intended behavior because spawn_blocking() isn't called anywhere in the program, but please let me know if I got something wrong and wrote code that cannot work on a single-threaded runtime by design. Thanks in advance!

@kotauskas kotauskas added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Nov 5, 2022
kotauskas added a commit to kotauskas/interprocess that referenced this issue Nov 5, 2022
Seems like a Tokio bug to me. Currently filed as
tokio-rs/tokio#5170.
@Darksonn Darksonn added the M-net Module: tokio/net label Nov 5, 2022
@Darksonn
Copy link
Contributor

Darksonn commented Nov 5, 2022

Does tokio-console report any tasks being constantly busy?

@kotauskas
Copy link
Author

No, it appears to be a hot loop of not doing a whole lot, not even polling any of the tasks:

image

An observation that might set us on the right track is how the client tasks have never once been polled at all. The 74:15 task is the server, and the 16 78:20 tasks are clients.

@satakuma
Copy link
Member

This code spins inside NamedPipeServer::connect. It turns out that with access_inbound set to false, mio will spuriously return a WRITABLE event and then wait for an actual connection. However, the current implementation doesn't clear the readiness state, so connect spins forever.

pub async fn connect(&self) -> io::Result<()> {
loop {
match self.io.connect() {
Ok(()) => break,
Err(e) if e.kind() == io::ErrorKind::WouldBlock => {
self.io.registration().readiness(Interest::WRITABLE).await?;
}
Err(e) => return Err(e),
}
}
Ok(())
}

I think changing connect to the following will fix this issue.

pub async fn connect(&self) -> io::Result<()> {
    self.io.registration().async_io(Interest::WRITABLE, || self.io.connect()).await?;
    Ok(())
}

@Thomasdezeeuw
Copy link
Contributor

@Darksonn does the suggestion by @satakuma solve the issue, or do I need to look into the Mio side of things?

Also /cc @carllerche who wrote the implementation.

@Darksonn
Copy link
Contributor

I was just about to say that the current implementation is equivalent to what async_io does, but I now see that it is missing a clear_readiness call... This probably isn't a mio bug after all then.

@Darksonn
Copy link
Contributor

@satakuma I tried applying your suggestion, but it seems that it isn't working and the windows CI seems to be stuck. Do you happen to know why? I don't have a windows machine available, so its quite difficult for me to test myself.

@satakuma
Copy link
Member

I apologize, should have tested it properly. It seems that with my suggestion connect works with inbound = false, but stopped working with inbound = true, what is the case in the tests. It probably comes from the fact, that we have to first issue a connect to the internal mio_windows::NamedPipe and then wait for readiness and eventually clear it after a fail.
I opened a duplicate PR with a fix, which should pass the windows CI this time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-net Module: tokio/net
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants