The order of files in your ext4 filesystem does not matter

65 points by thewisenerd 3 months ago

userbinator 3 months ago

> the actual argument value the JVM receives is "/jars/*", and in turn decides to be helpful, and expand the wildcard anyway

Whenever I see such things, I immediately think "whatever the resulting order is, it had better not matter"; and if it does, which is definitely true for Java classpaths, I consider it a bug that needs to be fixed ASAP, before it causes what happened in the article.

kevincox 3 months ago

Yeah, my jaw was dropping as I realized how far they went with this from checking their mount options to reading ext4 source code. Directly order is almost always an implementation detail (I'm pretty sure it is on ext4) and even if it isn't you still shouldn't rely on it (for when someone decides to migrate your production machines to BTRFS because they want snapshots and now your app has some weird breakage). The problem is that the app depends on directory order, and you need to fix that, not figure out how you can predict the directoy order.
Maybe there should be a mount options to randomize directory order that people can use in their staging environments.
- MrDrMcCoy 3 months ago
  
  > Maybe there should be a mount options to randomize directory order that people can use in their staging environments.
  The behavior I've witnessed suggests that the the order is based on inode numbering, which is initially sequential from creation time, and drifts semi-randomly as inodes are unlinked and reused. I don't know this for a fact, but it makes enough sense. Directory ordering should be assumed to be random in all cases, as you suggest.
ivanjermakov 3 months ago

Also, command line strings are limited to 128kiB on Linux: https://unix.stackexchange.com/questions/120642/what-defines...

wpollock 3 months ago

Counting on the order of files to support multiple versions of jars was never a good idea. Java does have multiple version ("release") jar files for your use case since java 9. See <https://docs.oracle.com/en/java/javase/24/docs/api/java.base...>.

Since duplicates on the classpath don't cause problems, a quick & dirty fix is to manually list versioned jars first, in order, then the jars/* argument.

yjftsjthsd-h 3 months ago

> there was a client library that needed a Bouncy Castle “provider” with a version “jdk15”+ as the client initialization used specific properties from a class, and those properties were only available in “jdk15”+.

> up until the node image update, we “fortunately” had node images with directory hash seeds ordering “jdk15” or “jdk18” before “jdk14”.

So the actual bug is that something needing jdk15+ should either retry or be deterministically fed a valid file, right? And this whole article is figuring out why the filesystem coincidentally masked it by accidentally always happening to hand it a file with what it needed?

amiga386 3 months ago

> something needing jdk15+
Actually, no, that "15" refers to Java 1.5, aka Java 5, released 2004. Bouncy Castle has some funky variants, specially for Java 1.1, 1.2, 1.3, 1.4, 5, 6, 7, 8. All you actually need is the Bouncy Castle for Java 8 onwards, which is pretty much all versions of Java in use today.
The bug is that multiple providers of Bouncy Castle don't cleanly work when in the classpath together. The authors of Bouncy Castle aren't changing that, because they're like "use our software correctly, please". It's not Java's fault, you can only make classes that don't work on old versions of the JDK, you can't make new Java somehow notice you've included a jar written specifically for an old version of the JVM.
Java did introduce the ability to create multi-release jar files, where you can have JDK-version-specific classes/resources in one jar file... but only from Java 9 onwards. All this mixing and matching by filename that Bouncy Castle uses is for Java 1.1 - Java 1.8 only.
You can also mix and match and cause failure by using one of the Bouncy Castle JCE provider variants with the wrong corresponding "pkix", "util", "mail" jars (extra jars for all the things you might want to do with cryptography that _aren't_ part of the standardised Java Cryptography Extensions API that the main "provider" jar implements). And you can also mess up by mixing FIPS-approved BC with FIPS-not-approved BC.
You only need one set of jars:
* If you don't need FIPS approval: bcprov-jdk18on, bcutil-jdk18on, bcpkix-jdk18on, ...
* If you do: bc-fips, bcutil-fils, bcpkix-fips, ...

o11c 3 months ago

It does matter for performance.

If you read files in the same order they are on disk (often, the order in which they were written, which readdir on modern filesystems should choose to produce), I/O is much faster.

eptcyka 3 months ago

Order of files listed in a directory need not match the order of the bytes saved in the physical media.
scrapheap 3 months ago

It's worth noticing that the performance difference between sequential and non-sequential reads will differ significantly between types of devices. It's much more noticeable on a spinning hard disk drive than it is on a solid-state drive.
bitwize 3 months ago

On spinning rust, sure. That does not hold for SSDs (which most consumer-grade computers have now).
- jbverschoor 3 months ago
  
  You’d still miss out on some potential prefetch cache misses
- LoganDark 3 months ago
  
  Literally condemn any computer that still comes new from the factory with spinning rust. I was using SSDs back in 2012.

jonhohle 3 months ago

Build tools supporting duplicate class detection have existed for… well a long time. Ignore them at your own peril.

rzzzt 3 months ago

The orange site discusses the article in the first footnote here: https://news.ycombinator.com/item?id=43573507

yjftsjthsd-h 3 months ago

Why "the orange site"?
- PMunch 3 months ago
  
  It was referenced in the article as "the orange site" however the reason for it initially being named as such is probably because of HNs system of trying to avoid popularity being artificially driven high. The details of this is as far as I know pretty scarce, but the idea is that if you try to get to the top of Hackernews they somehow detect that and penalize you. So people have taken to calling it "the orange site" in order to avoid this detection when talking about HN.
  - froh 3 months ago
    
    how about it being a simple gentle nod to the plain design of HN.
    
    udev4096 3 months ago
    
    stop idolizing HN. look at the privacy policy and tell me if it still looks appealing to you
    
    selfhoster11 3 months ago
    
    Not parent, but yes. 100% yes. It loads quickly, has great content density, lacks tons of JavaScript that tanks performance on slower machines, reminds me of the older, better times. For the same reason, many people still prefer the old Reddit UI compared to the new UI.
- aargh_aargh 3 months ago
  
  It's a different calling convention. Call by value rather than call by name.
- tom_ 3 months ago
  
  See the first sentence of the article!
  - yjftsjthsd-h 3 months ago
    
    ?
    > the title is a cheeky reference to something at the front page of the orange site today
    Yes, that's what I'm asking. Why do people refer to HN as "the orange site"?
    
    tom_ 3 months ago
    
    Oh, right. Well, that's easy: it's an orange site.
    I've seen it typically (though not universally) used seemingly dismissively, so I've always assumed it was a euphemism. People very commonly refrain from naming a thing directly if they disapprove.
    Disclaimer: I'm no mind reader.

dathinab 3 months ago

Always fun when code relies on the order of iterating over a dir (which is in general clearly not defined to have any order, even iterating the same dir 2th consecutively might not yield the same order depending on "stuff" (e.g. exact file system used)).

So if order matters, always sort.

(Luckily in most situations where dir iter order matters, the performance impact from sorting is acceptable or even outright irrelevant.)

amelius 3 months ago

By the way, max hardlink count for ext4 seems configured ridiculously low for modern standards, at least on Ubuntu.

amelius 3 months ago

"ls" can take ages on a large folder. Is there a way to make it more immediate, i.e. streaming output without sorting?

kristianp 3 months ago

It's something like ls -u from memory.
- amelius 3 months ago
  
  Looks like it's -U (capital U). But I just tried it and it still took several seconds for the first filename to appear. It was not the spinning up of the disk because I first did ls in the parent folder which was immediate. The second time I did ls on the large folder, though, it was fast (even without -U).
  - aaronmdjones 3 months ago
    
    > It was not the spinning up of the disk because I first did ls in the parent folder which was immediate
    That doesn't tell you anything; the parent's dentries could have been cached days ago and still present, meaning it didn't actually access the disk or cause it to be spun up (if it wasn't) at all.
    When doing any kind of repeatable measurement or experimentation on disks you will want to drop the page cache every time first:
    # echo 3 >/proc/sys/vm/drop_caches
    
    amelius 3 months ago
    
    Well, I just booted the computer ;) But you are right, dropping the cache is probably a better way.
    I have a folder with 5500 subfolders. Doing "ls -U" in that folder (after dropping the page cache like above) takes 50 seconds (!) And the dir entries appear all at once, i.e. not in a streaming way.
    Its parent folder only contains 6 subfolders. Doing a cache drop followed by "ls -U" gives immediate results.
    How to investigate this further? (Using an Ubuntu 18.04 system)
    
    rcxdude 3 months ago
    
    strace can tell you what system calls it's doing, what the results are, and how long they're taking, which may help narrow it down.
    
    amelius 3 months ago
    
    Thanks. Interestingly, strace speeds up the operation. What took 50s after a cache drop now becomes immediate with "strace -f ls -U".
    Dropping the cache and doing "time ls -U" gives:
    real 0m51,204s user 0m0,116s sys 0m0,718s
    Update: never mind, it appears to be something in my shell. Switching to tcsh completely eradicated the problem.
    
    MarceColl 3 months ago
    
    It's probably that ls in the other shell is the builtin instead of the binary, when strace runs ls -U it does run the binary and not the shell builtin. tcsh must also delegate to the binary instead of a builtin, or their builtin is faster.
    
    amelius 3 months ago
    
    Yeah, I haven't figured it out yet, but when I do /bin/ls, then indeed the problem does not show. Probably a case of Bash trying to be smarter than it needs to be.
  - kristianp 3 months ago
    
    Yes you're right, it's -U on linux (1). On Mac its -f (2). Linux also has -f, which is equivalent to -a -U .
    (1) https://man7.org/linux/man-pages/man1/ls.1.html
    (2) https://ss64.com/mac/ls.html

cheshire_cat 3 months ago

Could they have avoided that issue by specifying the classpath without the star?

So -cp /jars/ instead of -cp /jars/*?

Kwpolska 3 months ago

So what was the production fix? Surely you're not hex-editing the image until the end of time?

amiga386 3 months ago

The production fix is don't include 3 versions of the same dependency in the image build (use "bcprov-jdk18on" and don't use any other "bcprov")
Another fix can be to use a fat jar (containing your software and all its dependencies), but this doesn't work for Bouncy Castle, because Cryptography Is Special(TM), and Java won't load cryptography providers unless their jars are signed, and including the cryptography provider jar in the far jar means it loses its signature.
- Kwpolska 3 months ago
  
  > The production fix is don't include 3 versions of the same dependency in the image build (use "bcprov-jdk18on" and don't use any other "bcprov")
  I doubt anyone is doing that manually, that’s probably done by mvn/gradle/sbt/whatever the cool Java kids use these days. Do the build tools not know about this problem and just make a mess?
  - amiga386 3 months ago
    
    It's Bouncy Castle's particular situation. The Java build tools are totally fine with resolving thousands of version dependencies so everyone is happy. You can depend on A which in turn depends on B version 1.2 and also depend on C which depends on D which depends on E version 1.1 and you only end up with one version of B included, version 1.2. Java execution environments also support all kinds of classloader isolation so you have multiple versions of the same jar and classes, all in the same JVM, only visible to the components that wanted to see them, so there's no clash.
    But Bouncy Castle - and almost nothing else - adds another dimension across its artifact names. This is not standard! You now have to watch your dependency trees like a hawk to see that some other artifact doesn't bring in <artifactId>bcprov-jdk14</artifactId> to fuck with your <artifactId>bcprov-jdk18on</artifactId>, and if they do, you need to slap an <exclusion> on that dependency's dependency.
    The reason Bouncy Castle does this is because it chooses to support some very old versions of Java, that predate JDK 9 introducing multi-release jars (https://docs.oracle.com/en/java/javase/21/docs/specs/jar/jar...) which removed the need for different named jars for different JDKs (...but only from JDK 9 onwards)
    So, in general, the Java tools have this solved, unless you're Bouncy Castle.

tryauuum 3 months ago

great article

I'm feeling like an old man now but who the hell calls a tool "buildah"? Especially with its ugly dog logo. You can almost assume the dog wants to say "builder" but the extra flaps of skin makes the sound distorted

usr1106 3 months ago

At least it is search engine friendly. Recently had to search for code snippets for the 30 year old "expect" tool. Was rather difficult and I thought, well the Web is younger than that tool, they could not imagine a search engine. Hint: "expect script" seemed to work decently well.
- eMPee584 3 months ago
  
  (.. or a search on https://pkgs.org to surface metadata)
davideg 3 months ago

Looks like it's a silly and self-aware play on the word "builder" (New England regional dialect):
> Since I’m relatively new to the world of containers and images, I was excited to learn about the Buildah tool. Especially since I’m a native New Englander and it’s a clever play on how we say Builder in these parts. [0]
[0] https://buildah.io/blogs/2017/06/22/introducing-buildah.html
- ghaff 3 months ago
  
  That is correct. The person who largely had overall responsibility for Red Hat’s open source container tooling is a Boston area native.
Waterluvian 3 months ago

Much like the choice to stop using language features like capitalization, it’s part of the current cultural trend.
Kinda like Buildly or Buildr. It’s cool until it’s your turn to be old. Then you look back and wince.
- llmthrow103 3 months ago
  
  I've been using no capitalization on short messages in chat for more than 20 years (and still do), but an entire article written in the style makes it harder to read. It's funny that the author believes in syntax highlighting for code readability but not capitalization for English readability.
  - thaumasiotes 3 months ago
    
    > but an entire article written in the style makes it harder to read
    That's purely a familiarity effect; it's a self-solving problem.
    
    thombat 3 months ago
    
    Perhaps I'd become effortlessly fluent in Aramaic if I had to read enough articles in it, but absent some substantial benefit I'd prefer to keep with standard English.
    
    thaumasiotes 3 months ago
    
    You would, but it would take a while. Reading slightly different letter forms is more of a matter of hours.
- jraph 3 months ago
  
  > it’s part of the current cultural trend
  Is it, or it's just a niche just like people who write 5 digit years, putting a 0 in front?
  It's still very rare to encounter any of those.
  - jerven 3 months ago
    
    Is it a current trend? my Mom does this and she picked it up in the 70's on typewriters.
  - bongodongobob 3 months ago
    
    Those people are so short sighted. I put two 0's in front because I really care about humanity. This, I believe, will help fix climate change. Excuse me while I sniff my own farts.
- PMunch 3 months ago
  
  I mean "buildah" is at least searchable (imagine trying to look for a build tool called "builder"). The lack of capitalization doesn't have any positive side-effects, apart from saving your shift key some use..
throwaway127482 3 months ago

> who the hell calls a tool "buildah"?
Bostonians? :P
Brian_K_White 3 months ago

Since it's a name I'm fine with it. That is actually some people's pronounciation, even if no one's spelling, but I have no problem taking them seriously since they are not simply putting annoying affectation into writing, it's a name. Names have to be distinct, and they don't have to be cute but it's also not exactly damning either.
All that said, probably wouldn't have been my choice either.
It's weird. I personally wouldn't want quite such a silly name for that particular kind of tool, but that is a funny thing for me to say because I was never one of the people who wanted to remove the swear words from the kernel because "professional impression". Don't ask me to explain it.
dathinab 3 months ago

> hell calls a tool "buildah"?
people who seem to have done a pretty good job
I mean branding logo for a this kind of tool really doesn't matter and if so why should you hire a graphic designer to do that for you if you already have something which is passable.
You can read it as build-ah, ah is in some languages the word for the sound people make when they have a insight/light bulb moment. It might also just be a coincidence, idk.
But most importantly it's nicely searchable word, it's memorable too, it's pronounceable and it's somewhat related to what it does (a "build" tool).
So in all the metrics which matter it's a good name.
benatkin 3 months ago

I like the dog logo. Thanks for calling attention to it, I now have something to ghiblify.

udev4096 3 months ago

ext4 has no checksums, integrity checks, etc. it will silently corrupt your data and you wouldn't even know about it. switch to btrfs, it's way better

mossyfog 3 months ago

fun read, now i want to learn about overlays