Infrastructure tracker #40721

aturon · 2017-03-21T23:25:38Z

Status

Monitoring

Homu queue

Known issues

CI: currently experiencing a variety of problems that are causing the PR queue to back up:
- Travis outages and other problems on macOS.
  - We are reporting these to Travis as they occur, and are told they are working on it, but the problems are greatly delaying our PR testing.
- sccache bugs
  - @alexcrichton is currently focused full-time on debugging sccache, which
    is the s3-based ccache clone written in Rust that we use to cache LLVM builds
- Spurious build failures
  - @alexcrichton has worked hard to track these down, but would welcome help from any intrepid Rustaceans who want to take on the challenge.

Help wanted

All infrastructure issues

Easy

Check the PR queue for old PRs that have yet to be reviewed, and ping the reviewer on IRC or elsewhere. (Yes, you can and should do this!).
Check the PR queue for build failures, find the failed build, and extract out the information onto a comment on the PR.

Medium

Hard

Spurious build failures. These are currently extremely difficult to track down, due to the inability to reproduce locally or to log into the build machines. They are thus also very high value bugs to close. Contact @alexcrichton if you'd like to give it a shot.

Infrastructure projects

CI + releases. Currently set up via Travis + AppVeyor, with some additional infrastructure in Rust Central Station to monitor and control the builds.
- Maintained by @alexcrichton
Rust Central Station. Oversees CI/releases and nagbox. Set up using Docker.
- Maintained by @alexcrichton
homu. The bot behind @bors. Hooks into the above CI infrastructure to actually land PRs.
rfcbot. A bot for managing the FCP process of RFCs and tracking issues.
- Maintained by @dikaiosune
rusty-dash. A dashboard tracking a number of metrics for Rust and its community.
- Maintained by @dikaiosune
highfive. A bot that welcomes new contributors and randomly assigns reviewing duties.
- Maintained by @nrc.
nagbot. A bot for sending email reminders to the Rust subteams about reviewing duties.
- Maintained by @aturon
rustbuild. The x.py build system for the Rust compiler.
- Written and maintained by @alexcrichton.
play. The infrastructure behind https://play.rust-lang.org/
perf. Performance monitoring for the Rust compiler.
- Maintained by @nrc and @Mark-Simulacrum

The text was updated successfully, but these errors were encountered:

aturon · 2017-03-21T23:31:43Z

To folks who tend to monitor the PR queue or otherwise help out with infrastructure: for the time being, I'd like to try using this issue to centralize some tracking of what's going on with infrastructure, and ways people can get involved. Right now too much of this work is falling on too few shoulders (cough @alexcrichton cough) and we need to work on spreading it out.

If you see something amiss with any piece of infrastructure, please take a look at the status page on the top here to see if the issue is known. If it's not, open a new A-infrastructure issue and leave a comment with a link. When in doubt, leave a comment here. Similarly, if you want to help out but don't know how, leave a comment here.

cc @rust-lang/compiler @rust-lang/libs @frewsxcv @TimNN @Mark-Simulacrum @erickt @edunham @japaric @est31 @durka

bstrie · 2017-03-22T04:59:32Z

perf: currently down. (TODO: provide details)

@aturon Is perf.rlo updating? At the bottom it says "Updated as of: 12/30/2016, 1:24:27 PM"

nrc · 2017-03-22T19:43:25Z

Is perf.rlo updating?

No. @Mark-Simulacrum has been working on some improvements that should get it going again.

alexcrichton · 2017-03-23T20:30:33Z

One longstanding spurious failure is general network errors and I've opened an issue which I believe will help mitigate at least one instance of that, and help implementing it would be greatly appreciated!

alexcrichton · 2017-03-24T20:15:47Z

It looks like OSX cycle time for i686-apple-darwin has regressed 20% recently, and unfortunately I'm not sure how to explain it :(

frewsxcv · 2017-03-30T00:38:09Z

Looks like all appveyor builds are currently failing: #40694 (comment)

aidanhs · 2017-03-30T00:40:40Z

My fault :( Back to the drawing board...
Partial rollback to unblock appveyor already r+ed and the build has already got past the part that was blocked.

frewsxcv · 2017-03-31T01:53:16Z

Seemingly unrelated to the previous couple messages, the past few attempted PRs failed with the same error message on MSYS_BITS=32 on appveyor:

= note: "gcc" "-Wl,--enable-long-section-names" "-fno-use-linker-plugin" "-Wl,--nxcompat" "-nostdlib" "-Wl,--large-address-aware" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib\\crt2.o" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib\\rsbegin.o" "-L" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1-std\\i686-pc-windows-gnu\\release\\deps\\collectionstest-cf7d8872f6b0686a.0.o" "-o" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1-std\\i686-pc-windows-gnu\\release\\deps\\collectionstest-cf7d8872f6b0686a.exe" "-Wl,--gc-sections" "-nodefaultlibs" "-L" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1-std\\i686-pc-windows-gnu\\release\\deps" "-L" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1-std\\release\\deps" "-L" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib" "-Wl,-Bstatic" "-Wl,-Bdynamic" "-L" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib" "-l" "test-51dd1e12bb8fc6c0" "-L" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib" "-l" "term-9bb4a9959ced7ebc" "-L" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib" "-l" "getopts-83c65310844a796d" "-L" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib" "-l" "std-4d6881ec6132b951" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib\\libcompiler_builtins-7ac5a34e9b48514f.rlib" "-l" "kernel32" "-l" "advapi32" "-l" "ws2_32" "-l" "userenv" "-l" "shell32" "-l" "gcc_eh" "-lmingwex" "-lmingw32" "-lgcc" "-lmsvcrt" "-luser32" "-lkernel32" "C:\\projects\\rust\\build\\i686-pc-windows-gnu\\stage1\\lib\\rustlib\\i686-pc-windows-gnu\\lib\\rsend.o"
  = note: collect2.exe: error: ld returned 5 exit status

aidanhs · 2017-03-31T03:20:04Z

Similar to something lots of people using arduino saw? The resolving PR is fairly opaque about what the actual fix was though.

alexcrichton · 2017-03-31T05:45:13Z

@frewsxcv let's try tracking those here: #40906

It's likely related to the 4.9.3 -> 6.2.0 mingw upgrade

Mark-Simulacrum · 2017-04-01T23:46:24Z

Going to provide a summary of the current perf.rlo situation as I know it (cc @nikomatsakis, who I've talked to about this).

The current collection infrastructure is broken, for relatively unknown reasons, and I've deemed it hard to fix and sufficiently difficult to maintain that it needed a rewrite. That work has been started here: https://github.com/Mark-Simulacrum/rustc-perf-collector. The project works for the collection side of things (though it does not upload results to github), but it has not been integrated into the HTTP server for perf.rlo. I've been meaning to devote some time to this, as I don't expect it to be all that hard, but haven't quite gotten around to it yet, and we (Niko and myself) have come up with a few potential roadblocks to getting it started.

A number of the roadblocks are discussed and summarized in this internals post.

To summarize the current situation:

Collection based on downloading artifacts that are built as PRs land is implemented, and ~works.
It's unknown exactly what we want to collect (see the internals post)
Frontend (perf.rlo) does not work with new collection.

What needs to be done to get perf.rlo working once more:

Pruning and potential additions to the benchmark suite
New collection infrastructure running on the benchmark server
perf.rlo's backend portion updated to work with the new collection output

Let me know of any questions; I'd be happy to answer them.

aturon · 2017-04-03T19:49:19Z

An outage related to a Centos EOL being discussed in the #rust-infra channel.

3:39 PM so typically we cache docker images on travis
3:39 PM those caches seem to have been cleared
3:39 PM so we're trying to rebuild all our docker images on each pr now
3:39 PM one image is the centos 5 image that we build releases inside of
3:39 PM x86 and x86_64 images
3:39 PM so apparently centos EOL'd a couple days ago
3:40 PM and they appear to have flat out deleted info from their servers
3:40 PM so that docker container will no longer build
3:40 PM this means that everything is frozen until we fix thtat
3:40 PM possible strategies are:
3:40 PM a) figure out how to get the image building again
3:40 PM b) figure out how to build an older glibc somewhere else
3:40 PM c) bite the bullet and increase our glibc requirement
3:41 PM obviously (c) is the easiest
3:41 PM yet it's the highest impact b/c I have no idea what would break as a result
3:41 PM I have no idea how to do (a) and (b)
3:41 PM I'm currently investigating

aturon · 2017-04-03T19:51:18Z

Note: the previous comment is about a general outage for the queue.

frewsxcv · 2017-04-04T04:02:53Z

Regarding the previous comments, that issue has been resolved via #41045, though it (and other PRs) appear to be struggling to land because we keep hitting the three hour mark on Travis.

mrhota · 2017-04-04T04:21:21Z

every time we retry, everything starts over from the beginning, as if none of the prelim build prep, llvm build, and other not-actually-rustc-building happened

there has to be a better way

aidanhs · 2017-04-05T23:05:23Z

https://travis-ci.org/rust-lang/rust/builds/219024973 if you look at the two osx builders that took >2hr30min (!) you'll see that the logs have been truncated. When it was still building, opening the page got the truncated log, then new logs (e.g. test output) was streamed to me. Refreshing truncated it back down again.

Didn't cause a build failure, but maybe worth being aware of - I can imagine this being very annoying if the build had failed with truncated logs.

alexcrichton · 2017-04-05T23:12:01Z

@aidanhs yeah I've found the output to sometimes be confusing on Travis. The raw logs at least appear to not be truncated?

Note that we've got a separate issue for how slow OSX is

aidanhs · 2017-04-05T23:16:19Z

Odd, I definitely checked that and they were truncated too (or I wouldn't have mentioned it). I guess it was a blip that corrected itself, which is a relief.

alexcrichton · 2017-04-05T23:18:10Z

Heh I think I've definitely noticed that as well before, it sometimes just corrects itself ...

larsbergstrom · 2017-04-06T15:39:40Z

@nrc Is there any desire to merge your changes to highfive into the upstream servo/highfive repo? We've done a lot of work since your fork, and AFAIK there aren't any things in yours that were overly rustaecous and non-upstreamable.

nrc · 2017-04-06T23:04:52Z

@larsbergstrom I haven't looked at the Servo highfive for ages, but the last time I checked, the two had diverged considerably and merging would be very non-trivial. I have nothing against doing so, but it seems pretty low priority and is quite a lot of work so I can't see it actually happening.

frewsxcv · 2017-04-11T20:19:17Z

It looks like the CentOS 'vault' no longer has packages related to CentOS 5, so our builds will fail until we find a resolution:

frewsxcv · 2017-04-11T21:03:47Z

To follow up with my previous comment, it turns out we were using the wrong 'vault' URL path. The fix is in #41231.

steveklabnik · 2018-12-10T18:37:42Z

Triage ping. Not sure if this issue is still valid or worth it.

Mark-Simulacrum · 2018-12-11T03:35:58Z

Nominating for infra team discussion; I personally support closing this issue -- I don't think a tracking issue like this adds much to our work.

aturon added A-infrastructure metabug Issues about issues themselves ("bugs about bugs") labels Mar 21, 2017

Mark-Simulacrum added T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. and removed A-infrastructure labels Jun 25, 2017

Mark-Simulacrum added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Jul 27, 2017

Mark-Simulacrum added the I-nominated label Dec 11, 2018

aidanhs removed the I-nominated label Dec 11, 2018

aidanhs closed this as completed Dec 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infrastructure tracker #40721

Infrastructure tracker #40721

aturon commented Mar 21, 2017 •

edited by Mark-Simulacrum

Loading

aturon commented Mar 21, 2017

bstrie commented Mar 22, 2017

nrc commented Mar 22, 2017

alexcrichton commented Mar 23, 2017

alexcrichton commented Mar 24, 2017

frewsxcv commented Mar 30, 2017

aidanhs commented Mar 30, 2017 •

edited

Loading

frewsxcv commented Mar 31, 2017

aidanhs commented Mar 31, 2017

alexcrichton commented Mar 31, 2017

Mark-Simulacrum commented Apr 1, 2017

aturon commented Apr 3, 2017

aturon commented Apr 3, 2017

frewsxcv commented Apr 4, 2017

mrhota commented Apr 4, 2017

aidanhs commented Apr 5, 2017

alexcrichton commented Apr 5, 2017

aidanhs commented Apr 5, 2017

alexcrichton commented Apr 5, 2017

larsbergstrom commented Apr 6, 2017

nrc commented Apr 6, 2017

frewsxcv commented Apr 11, 2017 •

edited

Loading

frewsxcv commented Apr 11, 2017

steveklabnik commented Dec 10, 2018

Mark-Simulacrum commented Dec 11, 2018

Infrastructure tracker #40721

Infrastructure tracker #40721

Comments

aturon commented Mar 21, 2017 • edited by Mark-Simulacrum Loading

Status

Monitoring

Known issues

Help wanted

Easy

Medium

Hard

Infrastructure projects

aturon commented Mar 21, 2017

bstrie commented Mar 22, 2017

nrc commented Mar 22, 2017

alexcrichton commented Mar 23, 2017

alexcrichton commented Mar 24, 2017

frewsxcv commented Mar 30, 2017

aidanhs commented Mar 30, 2017 • edited Loading

frewsxcv commented Mar 31, 2017

aidanhs commented Mar 31, 2017

alexcrichton commented Mar 31, 2017

Mark-Simulacrum commented Apr 1, 2017

aturon commented Apr 3, 2017

aturon commented Apr 3, 2017

frewsxcv commented Apr 4, 2017

mrhota commented Apr 4, 2017

aidanhs commented Apr 5, 2017

alexcrichton commented Apr 5, 2017

aidanhs commented Apr 5, 2017

alexcrichton commented Apr 5, 2017

larsbergstrom commented Apr 6, 2017

nrc commented Apr 6, 2017

frewsxcv commented Apr 11, 2017 • edited Loading

frewsxcv commented Apr 11, 2017

steveklabnik commented Dec 10, 2018

Mark-Simulacrum commented Dec 11, 2018

aturon commented Mar 21, 2017 •

edited by Mark-Simulacrum

Loading

aidanhs commented Mar 30, 2017 •

edited

Loading

frewsxcv commented Apr 11, 2017 •

edited

Loading