CVE-2026-48931 Shouldn't Have Been a CVE
HTTP Response Queue Poisoning in Node.js, the node-fetch Breakage, and Why It Shouldn't Have Been a CVE.
I reported and fixed the HTTP/1.1 response queue poisoning issue in Node.js http.Agent that became CVE-2026-48931, and the Node.js team reviewed it and shipped it through the security process. Looking back, two things about that were mistakes, and one of them broke a lot of people's deploys. This post is me owning the parts that were mine.
The short version:
- The underlying behavior is real and worth hardening against. The guard I added is good defense in depth and should stay.
- Treating it as a vulnerability and pushing it onto the security-release track was the wrong instrument for the problem. This is something HTTP/1.1 does by design, not a bug specific to
http.Agent. - The fix I wrote carried a publicly observable side effect that made
node-fetch@2emit falseERR_STREAM_PREMATURE_CLOSEerrors, which cascaded into Google API auth, Firebase, Backstage, and the official Docker images. That one is squarely on me.
What the issue actually is
HTTP/1.1 has no response identifiers. A client matches a response to a request purely by ordering on the connection. On a keep-alive socket that gets returned to a pool and reused, that positional contract is the only thing binding response N to request N.
The attack: an upstream that you talk to writes an extra, unsolicited, syntactically valid HTTP/1.1 response onto the socket after your legitimate response has completed, while the socket is sitting idle in the agent's freeSockets pool. When the client pulls that socket for the next request, it parses the pre-staged bytes as that request's response. From then on, every response on that connection is shifted by one.
Before the fix, core http.Agent did nothing about this. It detached the parser and left the socket paused in the pool with only an error listener. Unsolicited bytes sat silently in the buffer and were consumed on the next reuse. The guard I added (freeSocketDataGuard) attaches a data path plus a resume on the idle socket so that any unsolicited byte destroys it instead of pooling it. Go has had this property for years as a side effect of its always-running readLoop. undici hit the same class of issue in parallel (GHSA-35p6-xmwp-9g52, CWE-367), from the same researcher report.
So far, so reasonable. Where I went wrong is what I called it and how I wrote it.
Why I no longer think this was a vulnerability
Three reasons, in increasing order of how much they should have stopped me.
The trust boundary already sits above this
The CWE-367 framing requires an attacker-controlled or compromised upstream. Once you are talking to a malicious origin, that origin already returns whatever bytes it wants for every request you send it. So the extra thing poisoning gives an attacker is small: it lets that upstream misalign responses on a connection you reused, so the answer to one of your requests comes back as the answer to another. There are setups where that bites, like a connection pool shared across different users or trust levels, or cases where which response pairs with which request really matters because they carry tokens or other secrets. But it all sits underneath a decision you already made. You chose to trust that server when you sent it the request, and a server you should not trust can hurt you in far simpler ways than this. Calling it low severity was fair. My mistake was sliding from "low severity" to "this belongs in the CVE process," as if those were the same call. They are not.
Every HTTP/1.1 client has the same race
This is the part I should have weighted more heavily. I went and looked at how other clients handle the exact same case. Every single one has the same irreducible race. The robust ones close on detectable idle or excess bytes, and that is good behavior, but none of them can bind a response to a request structurally on HTTP/1.1, because the binding is positional.
| Client | Mitigation for idle/excess bytes | Race remains? |
|---|---|---|
| curl / libcurl | Closes the connection on excess bytes after a response ("excess found in a read"), liveness probe before reuse | Yes, detection is timing dependent |
Go net/http |
readLoop detects data on an idle channel and marks the connection broken |
Yes, discussed openly in Go issues |
| urllib3 / requests | is_connection_dropped poll at reuse, error paths on malformed framing |
Yes, same protocol ambiguity |
| OkHttp | Treats stale bytes as broken-server behavior, suggests Connection: close or pool eviction |
Yes, maintainers say there is no fully satisfying generic fix |
| undici / fetch | Advisory and patch for reusable-socket poisoning | Yes in principle, HTTP/1.1 still has no response IDs |
JDK HttpClient |
Pool cleanup closes a connection on "Data received while in pool" | Yes, after checkout the bytes look like the next response |
| Apache HttpClient 5 | Reuse based on framing, keep-alive, full consumption, EOF stale check | Yes, readable extra data is not necessarily treated as stale |
Jetty HttpClient |
Detects unsolicited responses with no active exchange and closes | Yes, race if the next exchange is installed first |
| AsyncHttpClient (Netty) | Closes orphan channels with no associated future | Yes, once a future is attached the bytes are attributed to it |
RFC 9112 section 6.3 already tells clients not to treat data after a final response as a separate response, precisely because doing so enables cache poisoning. The clients that guard against this are implementing robustness against a known protocol hazard. A behavior shared by every conforming HTTP/1.1 client is a protocol property. Filing it as a CVE against http.Agent implies Node.js was uniquely broken here, when it was really just missing a hardening that some peers have and others still do not.
The race cannot be closed, so a "fix" cannot remediate it
This is the cleanest reason. The detection requires reading the socket, and the read is asynchronous to the decision to reuse. TCP is full duplex, so the peer can land bytes in your receive buffer at any instant, independent of your send side. "Observe the socket is idle" and "commit it to request N+1" can never be made atomic with respect to an attacker's writes. Any check-then-use has a gap. A pre-write readability probe does not even bound it tightly, because poison that arrives after the probe but before you read the genuine response still pre-empts it in the buffer.
The guard moves the exposure from "deterministically exploitable any time during the idle interval" to "exploitable only by winning a scheduling race at the reuse boundary." That is a real and worthwhile improvement, potentially several orders of magnitude in attempts required, which is why response-queue-poisoning write-ups talk about tens of thousands of requests against low-traffic targets. But the floor is strictly positive, not zero. My own regression test admits this: it relies on a setTimeout(0) to let the I/O poll phase run, with a comment that in a real attack there is always time between the poison arriving and the next request. The attack that matters is exactly the one where the adversary collapses that gap on purpose.
A CVE is best reserved for a defect whose fix meaningfully removes the exposure. This was mitigation and hardening, and it belonged in an ordinary release with the usual review and soak, not on the security fast-path.
The fix is still good. Keep it
Let me not overcorrect. The easy version of this post calls the whole thing security theater, and that would be wrong. The deterministic-during-idle case was a genuine gap, and closing it is the right call. If you are on a patched version, do not go out of your way to disable the guard. All the guard buys you is some reduction in residual risk against a hazard the protocol bakes in. And if your threat model includes a genuinely hostile upstream, remember that a server able to inject responses onto your idle sockets can almost certainly manipulate the real responses it sends you anyway. Reusing keep-alive connections against it is a risk you are choosing to accept, and the patch does not take it away. The only deterministic answers remain what they always were: do not reuse positionally against untrusted peers, or move to HTTP/2 and HTTP/3, where stream IDs decouple response identity from socket byte position and the whole class evaporates.
And since this is a property of the protocol rather than a bug, the response that actually lasts is documentation, not more code. I merged a change into undici that documents the HTTP/1.1 keep-alive and pipelining trust tradeoff, and a companion documentation change against Node.js core to clarify HTTP/1.1 response ordering in the official docs. That is the instrument this actually called for: state plainly what positional reuse against an untrusted upstream means, so users can make the trust decision on purpose instead of inheriting it silently.
What I broke, and how
This is the part I am most sorry about.
The guard I wrote used a public 'data' listener on the idle socket. node-fetch@2 inspects socket.listenerCount('data') to infer stream state, and with my listener attached it observed a count greater than zero during response close and reported false ERR_STREAM_PREMATURE_CLOSE errors. Because it landed in a coordinated security release across 22.x, 24.x, and 26.x at once, the blast radius was immediate and wide:
- Requests to Google APIs failing with "Premature close" through gaxios and google-auth-library
- Firebase CLI login failures
- Backstage (backstage#34651)
- The official Node.js Docker images (docker-node#2544)
Node.js does have a safety net for exactly this. A project called CITGM, short for Canary in the Gold Mine, pulls a curated lookup table of widely used ecosystem modules from npm and runs their test suites against release candidates, specifically to surface regressions before they ship. It did not flag this one because node-fetch is not in that table. The v2 line is mostly abandoned now, so it is not part of the canary set, even though a large amount of software still pulls it in. The regression slipped through a net built for precisely this kind of breakage, because the module it landed on had already dropped out of the net.
The follow-up, PR #64004, changes the guard to use the socket handle's internal onread hook while the socket is in the free pool, and restores the normal stream read callback on reuse. The guard still destroys sockets that receive unsolicited data while idle. It just no longer adds a publicly observable stream listener. It needed backporting to 22.x and 24.x to actually unblock node-fetch@2 users.
What I should have done
To the people whose deploys this broke: I am sorry. I said it on the PR and I will say it here. The regression came from code I wrote, and I should have caught it before it went out. It cleared review and a coordinated release, so the process did not catch it either, but the listener was mine and the urgency I pushed for is part of why it moved as fast as it did.
And to the Node.js releasers specifically: I am sorry. Correcting this meant cutting an unplanned follow-up minor across the release lines, right after the security release you had already shipped. That was avoidable work, and I am the reason it landed on you.
And this is the kind of regression a normal patch would not have produced. Changes that go through the ordinary flow land on the Current line first and only percolate down to Active LTS and Maintenance over the following weeks, after they have soaked in real-world use and run through more CITGM cycles. A security release does not wait. It ships to every supported line on the same day, which is why a regression the Current line would normally have surfaced long before it reached LTS and Maintenance users hit all of them at once instead.
What I would do differently:
- Separate "is this real" from "does this need a CVE," because this one was real but should have shipped as ordinary hardening rather than a security fix.
- Before any socket-layer guard goes near a security release, check it for side effects other code can observe, like changed listener counts or emitter state.
- Push to keep security releases for the bugs a fix actually closes, not for defense in depth.
None of this is a criticism of the person who reported it. Reporting it was correct and useful, and the report is what produced a guard that genuinely makes Node.js more robust. The judgment calls that went wrong, the classification and the rushed implementation, are mine.
Why this happens, and why human in the loop is not enough
This misclassification was not bad luck. It is what security maintenance looks like right now, and it is worth explaining.
Look at the numbers from the Node.js project. For most of 2025 we got single digits of security reports a month. Then it jumped: 21 in December, 35 in January, 58 in February, 64 in March. Roughly triple in three months, and still going up. The quality went the other way. Of 264 reports submitted and 262 closed, 32 turned out to be real, actionable issues, about 12 percent. Another 172, two thirds of the total, were closed as informative. The rest were duplicates, not applicable, or spam. Most of what arrives is not a vulnerability, and the small fraction that is gets buried in a pile that keeps growing.
A year ago a lot of reports were easy to dismiss. Incoherent writeup, code that does not exist, no proof of concept, a CWE that does not fit, close and move on. That does not work anymore. AI writes reports that pass every check that is cheap to run. The CWE is right, the code path is real, the proof of concept runs, the RFC quote is accurate. Nothing on the surface tells you whether you are looking at a real finding or a generated one.
What is left is the part that is expensive to do: working out whether the threat model is real or just asserted, whether the bug reproduces anywhere outside a lab, whether the fix closes anything or only papers over something the protocol allows, and whether it is even a bug rather than how every client in this space already behaves. You cannot read that off the report. You get there by knowing the code and going slowly, which usually takes longer than it took the reporter to generate the thing in the first place. That work does not speed up as the queue grows. It is capped by how many expert hours you have, and those are not growing.
A few months ago I wrote a post called The Human in the Loop. The point was that the human is there to provide judgment, to catch the moments where the model produces something that looks right and is not, and that you cannot hand your accountability to a model. I still think that holds when you are the one generating code. What this taught me is that the receiving end is harder. When more plausible, defensible-looking reports come in than there are hours to look at them properly, nothing breaks in an obvious way. The review just gets shallower. A report that passes all the cheap checks gets waved through, and once in a while, like here, it gets waved all the way into a CVE.
I want to be careful about CVE-2026-48931 here. The report was good. The behavior was real, the analysis held up, the proof of concept worked, and the person who filed it did the right thing. I am not calling it noise. The uncomfortable part is the reverse: a correct, well-argued report used to be a strong sign that something deserved a CVE, and that is no longer true, because anyone can produce a correct, well-argued report now. What this one needed was for someone to step back and say it is a timing race HTTP/1.1 cannot get rid of, every client has it, the fix hardens rather than removes it, so it should not be a CVE. That step back is the scarce thing, and a flood of good-looking reports is exactly what eats it.
I am the worst case for the optimistic version of this story, because I had every advantage. I maintain undici and the HTTP stack in Node.js core. I have read RFC 9112 and the ones before it closely enough to argue their corner cases, and I know exactly why this race cannot be closed. If anyone was positioned to read this report and say it was hardening and not a vulnerability, it was me, and I still got it wrong. If knowing the material were the thing that saves you, it would have saved me, and it did not. That is why I think this is structural and not a knowledge gap. The flood does not get past you because you are not good enough. It gets past you when you are good enough, and busy, and on the fortieth believable report that month.
So human in the loop is necessary but not enough. Putting a person at the end of the pipeline does not buy you much if the queue grows faster than that person can think about each item in it. What has to scale is the judgment per report, and that is the one thing going the wrong way.
References
- CVE-2026-48931, Node.js June 18, 2026 security releases: https://nodejs.org/en/blog/vulnerability/june-2026-security-releases
- Original guard commit (
freeSocketDataGuard): https://github.com/nodejs/node/commit/bc0b53813e - Breakage report (
node-fetchERR_STREAM_PREMATURE_CLOSE): https://github.com/nodejs/node/issues/63989 - Follow-up fix, avoid stream listeners on idle agent sockets (PR): https://github.com/nodejs/node/pull/64004
- Follow-up fix, landed commit: https://github.com/nodejs/node/commit/6577d3b28225d0597e001a0f99942081057b0b82
- Downstream breakage, Backstage: https://github.com/backstage/backstage/issues/34651
- Downstream breakage, docker-node: https://github.com/nodejs/docker-node/issues/2544
- Node.js CITGM (Canary in the Gold Mine): https://github.com/nodejs/citgm
- undici advisory (parallel issue): https://github.com/nodejs/undici/security/advisories/GHSA-35p6-xmwp-9g52
- Documentation, undici keep-alive trust tradeoff: https://github.com/nodejs/undici/pull/5457
- Documentation, Node.js core HTTP/1.1 response ordering: https://github.com/nodejs/node/pull/64213
- Go
net/httptransport (readLoop): https://go.dev/src/net/http/transport.go - curl, excess bytes close the connection: https://github.com/curl/curl/issues/13201
- RFC 9112 section 6.3, Message Body Length: https://www.rfc-editor.org/rfc/rfc9112.html#section-6.3
- The Human in the Loop: https://adventures.nodeland.dev/archive/the-human-in-the-loop/
- PortSwigger, HTTP/1.1 must die: the desync endgame: https://portswigger.net/research/http1-must-die