userbinator 2 days ago

> the actual argument value the JVM receives is "/jars/*", and in turn decides to be helpful, and expand the wildcard anyway

Whenever I see such things, I immediately think "whatever the resulting order is, it had better not matter"; and if it does, which is definitely true for Java classpaths, I consider it a bug that needs to be fixed ASAP, before it causes what happened in the article.

  • kevincox 2 days ago

    Yeah, my jaw was dropping as I realized how far they went with this from checking their mount options to reading ext4 source code. Directly order is almost always an implementation detail (I'm pretty sure it is on ext4) and even if it isn't you still shouldn't rely on it (for when someone decides to migrate your production machines to BTRFS because they want snapshots and now your app has some weird breakage). The problem is that the app depends on directory order, and you need to fix that, not figure out how you can predict the directoy order.

    Maybe there should be a mount options to randomize directory order that people can use in their staging environments.

    • MrDrMcCoy a day ago

      > Maybe there should be a mount options to randomize directory order that people can use in their staging environments.

      The behavior I've witnessed suggests that the the order is based on inode numbering, which is initially sequential from creation time, and drifts semi-randomly as inodes are unlinked and reused. I don't know this for a fact, but it makes enough sense. Directory ordering should be assumed to be random in all cases, as you suggest.

wpollock 2 days ago

Counting on the order of files to support multiple versions of jars was never a good idea. Java does have multiple version ("release") jar files for your use case since java 9. See <https://docs.oracle.com/en/java/javase/24/docs/api/java.base...>.

Since duplicates on the classpath don't cause problems, a quick & dirty fix is to manually list versioned jars first, in order, then the jars/* argument.

yjftsjthsd-h 2 days ago

> there was a client library that needed a Bouncy Castle “provider” with a version “jdk15”+ as the client initialization used specific properties from a class, and those properties were only available in “jdk15”+.

> up until the node image update, we “fortunately” had node images with directory hash seeds ordering “jdk15” or “jdk18” before “jdk14”.

So the actual bug is that something needing jdk15+ should either retry or be deterministically fed a valid file, right? And this whole article is figuring out why the filesystem coincidentally masked it by accidentally always happening to hand it a file with what it needed?

  • amiga386 2 days ago

    > something needing jdk15+

    Actually, no, that "15" refers to Java 1.5, aka Java 5, released 2004. Bouncy Castle has some funky variants, specially for Java 1.1, 1.2, 1.3, 1.4, 5, 6, 7, 8. All you actually need is the Bouncy Castle for Java 8 onwards, which is pretty much all versions of Java in use today.

    The bug is that multiple providers of Bouncy Castle don't cleanly work when in the classpath together. The authors of Bouncy Castle aren't changing that, because they're like "use our software correctly, please". It's not Java's fault, you can only make classes that don't work on old versions of the JDK, you can't make new Java somehow notice you've included a jar written specifically for an old version of the JVM.

    Java did introduce the ability to create multi-release jar files, where you can have JDK-version-specific classes/resources in one jar file... but only from Java 9 onwards. All this mixing and matching by filename that Bouncy Castle uses is for Java 1.1 - Java 1.8 only.

    You can also mix and match and cause failure by using one of the Bouncy Castle JCE provider variants with the wrong corresponding "pkix", "util", "mail" jars (extra jars for all the things you might want to do with cryptography that _aren't_ part of the standardised Java Cryptography Extensions API that the main "provider" jar implements). And you can also mess up by mixing FIPS-approved BC with FIPS-not-approved BC.

    You only need one set of jars:

    * If you don't need FIPS approval: bcprov-jdk18on, bcutil-jdk18on, bcpkix-jdk18on, ...

    * If you do: bc-fips, bcutil-fils, bcpkix-fips, ...

o11c 2 days ago

It does matter for performance.

If you read files in the same order they are on disk (often, the order in which they were written, which readdir on modern filesystems should choose to produce), I/O is much faster.

  • eptcyka 2 days ago

    Order of files listed in a directory need not match the order of the bytes saved in the physical media.

  • scrapheap 2 days ago

    It's worth noticing that the performance difference between sequential and non-sequential reads will differ significantly between types of devices. It's much more noticeable on a spinning hard disk drive than it is on a solid-state drive.

  • bitwize 2 days ago

    On spinning rust, sure. That does not hold for SSDs (which most consumer-grade computers have now).

    • jbverschoor 2 days ago

      You’d still miss out on some potential prefetch cache misses

    • LoganDark 2 days ago

      Literally condemn any computer that still comes new from the factory with spinning rust. I was using SSDs back in 2012.

jonhohle 2 days ago

Build tools supporting duplicate class detection have existed for… well a long time. Ignore them at your own peril.

rzzzt 2 days ago

The orange site discusses the article in the first footnote here: https://news.ycombinator.com/item?id=43573507

  • yjftsjthsd-h 2 days ago

    Why "the orange site"?

    • PMunch 2 days ago

      It was referenced in the article as "the orange site" however the reason for it initially being named as such is probably because of HNs system of trying to avoid popularity being artificially driven high. The details of this is as far as I know pretty scarce, but the idea is that if you try to get to the top of Hackernews they somehow detect that and penalize you. So people have taken to calling it "the orange site" in order to avoid this detection when talking about HN.

      • froh 2 days ago

        how about it being a simple gentle nod to the plain design of HN.

        • udev4096 2 days ago

          stop idolizing HN. look at the privacy policy and tell me if it still looks appealing to you

          • selfhoster11 2 days ago

            Not parent, but yes. 100% yes. It loads quickly, has great content density, lacks tons of JavaScript that tanks performance on slower machines, reminds me of the older, better times. For the same reason, many people still prefer the old Reddit UI compared to the new UI.

    • tom_ 2 days ago

      See the first sentence of the article!

      • yjftsjthsd-h 2 days ago

        ?

        > the title is a cheeky reference to something at the front page of the orange site today

        Yes, that's what I'm asking. Why do people refer to HN as "the orange site"?

        • tom_ a day ago

          Oh, right. Well, that's easy: it's an orange site.

          I've seen it typically (though not universally) used seemingly dismissively, so I've always assumed it was a euphemism. People very commonly refrain from naming a thing directly if they disapprove.

          Disclaimer: I'm no mind reader.

dathinab 2 days ago

Always fun when code relies on the order of iterating over a dir (which is in general clearly not defined to have any order, even iterating the same dir 2th consecutively might not yield the same order depending on "stuff" (e.g. exact file system used)).

So if order matters, always sort.

(Luckily in most situations where dir iter order matters, the performance impact from sorting is acceptable or even outright irrelevant.)

amelius 2 days ago

By the way, max hardlink count for ext4 seems configured ridiculously low for modern standards, at least on Ubuntu.

amelius 2 days ago

"ls" can take ages on a large folder. Is there a way to make it more immediate, i.e. streaming output without sorting?

  • kristianp 2 days ago

    It's something like ls -u from memory.

    • amelius 2 days ago

      Looks like it's -U (capital U). But I just tried it and it still took several seconds for the first filename to appear. It was not the spinning up of the disk because I first did ls in the parent folder which was immediate. The second time I did ls on the large folder, though, it was fast (even without -U).

      • aaronmdjones 2 days ago

        > It was not the spinning up of the disk because I first did ls in the parent folder which was immediate

        That doesn't tell you anything; the parent's dentries could have been cached days ago and still present, meaning it didn't actually access the disk or cause it to be spun up (if it wasn't) at all.

        When doing any kind of repeatable measurement or experimentation on disks you will want to drop the page cache every time first:

            # echo 3 >/proc/sys/vm/drop_caches
        • amelius 2 days ago

          Well, I just booted the computer ;) But you are right, dropping the cache is probably a better way.

          I have a folder with 5500 subfolders. Doing "ls -U" in that folder (after dropping the page cache like above) takes 50 seconds (!) And the dir entries appear all at once, i.e. not in a streaming way.

          Its parent folder only contains 6 subfolders. Doing a cache drop followed by "ls -U" gives immediate results.

          How to investigate this further? (Using an Ubuntu 18.04 system)

          • rcxdude 2 days ago

            strace can tell you what system calls it's doing, what the results are, and how long they're taking, which may help narrow it down.

            • amelius 2 days ago

              Thanks. Interestingly, strace speeds up the operation. What took 50s after a cache drop now becomes immediate with "strace -f ls -U".

              Dropping the cache and doing "time ls -U" gives:

                  real   0m51,204s
                  user   0m0,116s
                  sys    0m0,718s
              
              Update: never mind, it appears to be something in my shell. Switching to tcsh completely eradicated the problem.
              • MarceColl 2 days ago

                It's probably that ls in the other shell is the builtin instead of the binary, when strace runs ls -U it does run the binary and not the shell builtin. tcsh must also delegate to the binary instead of a builtin, or their builtin is faster.

                • amelius 2 days ago

                  Yeah, I haven't figured it out yet, but when I do /bin/ls, then indeed the problem does not show. Probably a case of Bash trying to be smarter than it needs to be.

cheshire_cat 2 days ago

Could they have avoided that issue by specifying the classpath without the star?

So -cp /jars/ instead of -cp /jars/*?

Kwpolska 2 days ago

So what was the production fix? Surely you're not hex-editing the image until the end of time?

  • amiga386 2 days ago

    The production fix is don't include 3 versions of the same dependency in the image build (use "bcprov-jdk18on" and don't use any other "bcprov")

    Another fix can be to use a fat jar (containing your software and all its dependencies), but this doesn't work for Bouncy Castle, because Cryptography Is Special(TM), and Java won't load cryptography providers unless their jars are signed, and including the cryptography provider jar in the far jar means it loses its signature.

    • Kwpolska 2 days ago

      > The production fix is don't include 3 versions of the same dependency in the image build (use "bcprov-jdk18on" and don't use any other "bcprov")

      I doubt anyone is doing that manually, that’s probably done by mvn/gradle/sbt/whatever the cool Java kids use these days. Do the build tools not know about this problem and just make a mess?

      • amiga386 2 days ago

        It's Bouncy Castle's particular situation. The Java build tools are totally fine with resolving thousands of version dependencies so everyone is happy. You can depend on A which in turn depends on B version 1.2 and also depend on C which depends on D which depends on E version 1.1 and you only end up with one version of B included, version 1.2. Java execution environments also support all kinds of classloader isolation so you have multiple versions of the same jar and classes, all in the same JVM, only visible to the components that wanted to see them, so there's no clash.

        But Bouncy Castle - and almost nothing else - adds another dimension across its artifact names. This is not standard! You now have to watch your dependency trees like a hawk to see that some other artifact doesn't bring in <artifactId>bcprov-jdk14</artifactId> to fuck with your <artifactId>bcprov-jdk18on</artifactId>, and if they do, you need to slap an <exclusion> on that dependency's dependency.

        The reason Bouncy Castle does this is because it chooses to support some very old versions of Java, that predate JDK 9 introducing multi-release jars (https://docs.oracle.com/en/java/javase/21/docs/specs/jar/jar...) which removed the need for different named jars for different JDKs (...but only from JDK 9 onwards)

        So, in general, the Java tools have this solved, unless you're Bouncy Castle.

tryauuum 5 days ago

great article

I'm feeling like an old man now but who the hell calls a tool "buildah"? Especially with its ugly dog logo. You can almost assume the dog wants to say "builder" but the extra flaps of skin makes the sound distorted

  • usr1106 2 days ago

    At least it is search engine friendly. Recently had to search for code snippets for the 30 year old "expect" tool. Was rather difficult and I thought, well the Web is younger than that tool, they could not imagine a search engine. Hint: "expect script" seemed to work decently well.

  • davideg 2 days ago

    Looks like it's a silly and self-aware play on the word "builder" (New England regional dialect):

    > Since I’m relatively new to the world of containers and images, I was excited to learn about the Buildah tool. Especially since I’m a native New Englander and it’s a clever play on how we say Builder in these parts. [0]

    [0] https://buildah.io/blogs/2017/06/22/introducing-buildah.html

    • ghaff 2 days ago

      That is correct. The person who largely had overall responsibility for Red Hat’s open source container tooling is a Boston area native.

  • Waterluvian 2 days ago

    Much like the choice to stop using language features like capitalization, it’s part of the current cultural trend.

    Kinda like Buildly or Buildr. It’s cool until it’s your turn to be old. Then you look back and wince.

    • llmthrow103 2 days ago

      I've been using no capitalization on short messages in chat for more than 20 years (and still do), but an entire article written in the style makes it harder to read. It's funny that the author believes in syntax highlighting for code readability but not capitalization for English readability.

      • thaumasiotes 2 days ago

        > but an entire article written in the style makes it harder to read

        That's purely a familiarity effect; it's a self-solving problem.

        • thombat 2 days ago

          Perhaps I'd become effortlessly fluent in Aramaic if I had to read enough articles in it, but absent some substantial benefit I'd prefer to keep with standard English.

          • thaumasiotes a day ago

            You would, but it would take a while. Reading slightly different letter forms is more of a matter of hours.

    • jraph 2 days ago

      > it’s part of the current cultural trend

      Is it, or it's just a niche just like people who write 5 digit years, putting a 0 in front?

      It's still very rare to encounter any of those.

      • jerven 2 days ago

        Is it a current trend? my Mom does this and she picked it up in the 70's on typewriters.

      • bongodongobob 2 days ago

        Those people are so short sighted. I put two 0's in front because I really care about humanity. This, I believe, will help fix climate change. Excuse me while I sniff my own farts.

    • PMunch 2 days ago

      I mean "buildah" is at least searchable (imagine trying to look for a build tool called "builder"). The lack of capitalization doesn't have any positive side-effects, apart from saving your shift key some use..

  • throwaway127482 2 days ago

    > who the hell calls a tool "buildah"?

    Bostonians? :P

  • Brian_K_White 2 days ago

    Since it's a name I'm fine with it. That is actually some people's pronounciation, even if no one's spelling, but I have no problem taking them seriously since they are not simply putting annoying affectation into writing, it's a name. Names have to be distinct, and they don't have to be cute but it's also not exactly damning either.

    All that said, probably wouldn't have been my choice either.

    It's weird. I personally wouldn't want quite such a silly name for that particular kind of tool, but that is a funny thing for me to say because I was never one of the people who wanted to remove the swear words from the kernel because "professional impression". Don't ask me to explain it.

  • dathinab 2 days ago

    > hell calls a tool "buildah"?

    people who seem to have done a pretty good job

    I mean branding logo for a this kind of tool really doesn't matter and if so why should you hire a graphic designer to do that for you if you already have something which is passable.

    You can read it as build-ah, ah is in some languages the word for the sound people make when they have a insight/light bulb moment. It might also just be a coincidence, idk.

    But most importantly it's nicely searchable word, it's memorable too, it's pronounceable and it's somewhat related to what it does (a "build" tool).

    So in all the metrics which matter it's a good name.

  • benatkin 2 days ago

    I like the dog logo. Thanks for calling attention to it, I now have something to ghiblify.

udev4096 2 days ago

ext4 has no checksums, integrity checks, etc. it will silently corrupt your data and you wouldn't even know about it. switch to btrfs, it's way better

mossyfog 2 days ago

fun read, now i want to learn about overlays