public inbox for bass@lists.stargrave.org
Atom feed
* Improved distfiles subsystem
@ 2026-03-20  8:39 Sergey Matveev
  0 siblings, 0 replies; only message in thread
From: Sergey Matveev @ 2026-03-20  8:39 UTC (permalink / raw)
  To: bass

[-- Attachment #1: Type: text/plain, Size: 10056 bytes --]

Greetings!

There is not much happening on that maillist, just because BASS skel
subsystem is not changing much for years, being pretty steady.

But during recent months a distfiles-related download subsystem improved
much, making BASS the most flexible and powerful packaging system.

Initially BASS used Metalink4 (.meta4) files to list possible download
URL and checksums to verify against. .meta4 were also used inside
skelpkgs to protect the package's integrity. That requires you to have
XML-aware software, which is far from being trivial and lightweight.
Moreover distfile's .meta4 listed only a single URL.

Proper distfiles download system MUST support multiple URLs
specification and usage. .meta4 allows that, but not the download
software as a rule. And most importantly: how to choose which URL is
more preferred from available subset? Various networks have various
connectivity and availability problems. Large CDNs could easily block
the whole countries or vice versa (countries will block large CDNs). Do
we need to get distfile as quickly as possible, even if that involves
transit traffic exchange? Or we prefer to localise it, try to utilise
peering links and omit transit involving at all?

First of all: nothing forces to use anything from the described below.
Any distfiles download target is just an ordinary redo target, so you
are free to use it as you wish.

Today's existing BASS distfiles use the "metadir"s instead of .meta4
files. It is just an ordinary directory holding several files:

    -- build/distfiles/meta/less-692.tar.gz/size --
    987633

    -- build/distfiles/meta/less-692.tar.gz/hashes --
    blake3-256 21dd0ae858ca02990cdc...
    blake2b-512 379d7738894f16fed1b...
    blake2b-256 1c5ee18b5152b9e09c9...
    skein-512 6d3ceb2770a7e2d2f0872...
    shake128 69dc559a582ddb977d2929...
    shake256 4b7deb3358dd4fafbe9294...
    sha-512 57a2d2b8c45c93550ab3f4c...
    sha-256 61300f603798ecf1d778657...
    streebog-512 6aac121b22c483dadc...
    streebog-256 375bdd4469366ed0af...
    xxh3-128 e7c0ea62217240017bda9f...

    -- build/distfiles/meta/less-692.tar.gz/urls.do --
    fn=$(basename $(pwd))
    echo "1|us|https://greenwoodsoftware.com/less/$fn"
    redo-ifchange ../../lib/urls-for-gnu
    PRI=2 ../../lib/urls-for-gnu less/$fn

As you may notice, it could be trivially converted to .meta4 file. There
are build/distfiles/bin/{meta4-to-metadir,metadir-to-meta4,metadir-from-file}
helpers to deal with all of that. Those files are easy to deal in
ordinary POSIX shell environment, not involving complex XML libraries.

Each URL resembles an entry from Metalink4, containing optional priority
and location attributes. "123|ru|http://..." strings are used. For
example "less" pager mentioned above distributes his tarballs through
his own server and the GNU's mirror system.

    -- distfiles/meta/freetype-2.14.2.tar.xz/urls.do --
    redo-ifchange ../utils/urls-for-savannah ../utils/urls-for-sourceforge
    ../utils/urls-for-savannah freetype/freetype-2.14.2.tar.xz
    PRI=5 ../utils/urls-for-sourceforge freetype freetype2/2.14.2/freetype-2.14.2.tar.xz

Freetype is distributed both from GNU Savannah's and SourceForge
mirrors, also having a cached copy in NetBSD's distcache system:

    -- distfiles/meta/freetype-2.14.2.tar.xz/urls.do --
    fn=$(basename $(pwd))
    redo-ifchange \
        ../../lib/urls-for-savannah \
        ../../lib/urls-for-sourceforge \
        ../../lib/urls-for-distcache-NetBSD
    ../../lib/urls-for-savannah freetype/$fn
    PRI=5 ../../lib/urls-for-sourceforge freetype freetype2/2.14.2/$fn
    ../../lib/urls-for-distcache-NetBSD $fn

All those redo targets and utilities used solely to generate a list of
possible download URLs:

    -- distfiles/meta/freetype-2.14.2.tar.xz/urls --
    1||http://download.savannah.nongnu.org/releases/freetype/freetype-2.14.2.tar.xz
    2||https://download.savannah.nongnu.org/releases/freetype/freetype-2.14.2.tar.xz
    3|at|http://mirror.easyname.at/nongnu/freetype/freetype-2.14.2.tar.xz
    4|at|https://mirror.easyname.at/nongnu/freetype/freetype-2.14.2.tar.xz
    3|at|http://mirror.kumi.systems/nongnu/freetype/freetype-2.14.2.tar.xz
    4|at|https://mirror.kumi.systems/nongnu/freetype/freetype-2.14.2.tar.xz
    [...]
    5||https://sourceforge.net/projects/freetype/files/freetype2/2.14.2/freetype-2.14.2.tar.xz/download
    6|ar|http://sitsa.dl.sourceforge.net/project/freetype/freetype2/2.14.2/freetype-2.14.2.tar.xz?viasf=1
    7|ar|https://sitsa.dl.sourceforge.net/project/freetype/freetype2/2.14.2/freetype-2.14.2.tar.xz?viasf=1
    6|au|http://ixpeering.dl.sourceforge.net/project/freetype/freetype2/2.14.2/freetype-2.14.2.tar.xz?viasf=1
    7|au|https://ixpeering.dl.sourceforge.net/project/freetype/freetype2/2.14.2/freetype-2.14.2.tar.xz?viasf=1
    [...]
    20|xf|http://cdn.NetBSD.org/pub/pkgsrc/distfiles/freetype-2.14.2.tar.xz
    21|xf|https://cdn.NetBSD.org/pub/pkgsrc/distfiles/freetype-2.14.2.tar.xz
    25|us|http://ftp.NetBSD.org/pub/pkgsrc/distfiles/freetype-2.14.2.tar.xz
    26|us|https://ftp.NetBSD.org/pub/pkgsrc/distfiles/freetype-2.14.2.tar.xz

BASS suggests the following rules for setting priorities:

* balancers > mirrors > CDNs > homepage > GitHub
* all links should use HTTP as higher priority than HTTPS
* distcache.FreeBSD.org and cdn.NetBSD.org/pub/pkgsrc/distfiles
  are used as least priority fallback

"xx"-reserved country codes namespace is used in BASS to specify the CDN
networks: xa=Akamai xc=Cloudflare xf=Fastly xg=GitHub xm=Amazon.
Often they tend to work badly or slowly, so users may want to set lower
priority for them. And most importantly, if all of us will "move" to
world of huge CDNs, there no mirrors will be left worth of supporting,
meaning a single point of availability.

How we can use them? If we sort them numerically, then their priorities
already suggest us the preferable way to download attempt. But that is
deterministic and stable list, so in the freetype's example above we
will always hit Austria's server first, in case savannah.nongnu.org's
front balancer won't be available. We would like to shuffle list of
inside each priority group. Personally I would prefer Russian-located
servers, because non-transit peering will likely be involved. Then I
would prefer Denmark/Finland/Netherlands/Sweden ones, as we tend to have
good fat trunk connections and peering with them. Then all other
remaining European-based ones, North American next. But setting lowest
priority for Ukraine servers (as there is nearly no connectivity from
our networks with them) and Fastly+Cloudflare ones. Also I am aware that
less'es homepage (greenwoodsoftware) is not available from our networks
at all. How can we express all those rules? Easily, by overriding the
FETCHER_URLS_SORT function!

    -- bass/build/rc-local --
    [...]
    FETCHER_URLS_SORT() {
        grep -v greenwoodsoftware |
        $DISTFILES/lib/urls-sort ru "dk fi nl se" '!ua' '!xf' '!xc' "" c:eu c:na rand
    }

Remind you, that "xf" and "xc" are Fastly and Cloudflare CDNs. "c:"
notation references a continent, group of countries. "rand" just
shuffles the URLs. And implicitly that utility sort URLs by priority.
Empty country ("") means prioritising the location-less URLs, which are
tend to be an entrypoint load-balancers.

Adding "-6" option to urls-sort will leave only IPv6-capable addresses.
It is done literally by issuing a DNS request, so filtering a huge list
of URLs may take some time when DNS cache is not hot enough.

Previously BASS used either "meta4ra-check -dl" (meta4ra-dl later), or
"wget with .meta4 support", or "aria2" for fetching the tarballs.
meta4ra-check was used to check the file's integrity against the
.meta4's hashes. build/bin/hashes-check is POSIX shell alternative.
build/bin/hashes-gen could be a replacement for meta4ra-create utility,
without loosing any multiprocessing ability. Because of that, we do not
have to use meta4ra utilities anymore. Less dependency on Go-written
software, which may limit possible BASS target platforms. By default any
of "fetch", "wget", "curl", "meta4ra-dl" can be used for fetching,
without any .meta4 support at all. You can use any kind of tool which
will take URL as an argument and stream downloaded file to stdout.

Actually the only built Go dependency still left is goredo redo
implementation. Absolutely no .do file uses any of its specific
capabilities. You are free to use any implementation.

One special issue is a software which does not provide any tarballs.
Software that solely lives inside VCS. For example GitHub offers ability
to get the tarball of specific Git's tag, but generated tarballs are not
guaranteed to stay the same. Any archiving/compression algorithm may
change and tarball's checksums won't pass verification. So we have to
clone the repository and manually create tarballs. Most BASS skels
already did this before. Now there are helper to ease that often met
use-case:

    -- distfiles/dl/brotli-1.2.0.tar.zst.do --
    ../lib/git-to-tarball brotli \
        https://github.com/google/brotli.git \
        028fb5a23661f123017c060daa546b55cf4bde29 \
        ${1%.tar.zst}

git-with-submodules-to-tarball utility will also fetch all submodules
and generated concatenated pax archives with their contents.

Previously .meta4 included *PGP/SSH signatures of the tarballs. But
initially they lacked the reference where they were downloaded. .meta4
files sections were extended to include URL(s) to those signatures.
Current BASS also contain links in separate metadirs, but also includes
those signatures as downloaded files in $DISTFILES/dl/cache directory.
Previously many signatures were converted to ASCII armour format,
because of being binary. Now this is not an issue.

-- 
Sergey Matveev (http://www.stargrave.org/)
LibrePGP: 12AD 3268 9C66 0D42 6967  FD75 CB82 0563 2107 AD8A

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-03-20  8:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-20  8:39 Improved distfiles subsystem Sergey Matveev