Inspecting streams for hashing purposes

This is a personal note on how to inspect async streams in order to hash the contents without storing and reading the stream once more. This might become the start of a a series of little code snippets that might be interesting to others as well.

Code

As an example fetch a URL and compute the SHA256 hash on it:

use anyhow::anyhow;
use futures::StreamExt;
use sha2::Digest;
use std::path::PathBuf;

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), anyhow::Error> {
    let arg = std::env::args()
        .skip(1)
        .next()
        .ok_or(anyhow!("no URL given"))?;

    let url = url::Url::parse(&arg)?;
    let path = PathBuf::try_from(url.path())?;
    let filename = path
        .file_name()
        .ok_or(anyhow!("URL does not contain a filename"))?;
    let mut hasher = sha2::Sha256::new();

    let stream = reqwest::get(url)
        .await?
        .bytes_stream()
        .inspect(|bytes| {
            if let Ok(bytes) = bytes {
                hasher.update(bytes);
            }
        })
        .map(|chunk| chunk.map_err(|err| std::io::Error::new(std::io::ErrorKind::Other, err)));

    let mut reader = tokio_util::io::StreamReader::new(stream);
    let mut file = tokio::io::BufWriter::new(tokio::fs::File::create(filename).await?);
    tokio::io::copy(&mut reader, &mut file).await?;

    let sum = hasher.finalize();
    println!("{} {:?}", hex::encode(&sum), filename);

    Ok(())
}

Cargo.toml dependencies

[dependencies]
anyhow = "1.0.80"
futures = "0.3.30"
hex = "0.4.3"
reqwest = { version = "0.11.24", features = ["stream"] }
sha2 = "0.10.8"
tokio = { version = "1.36.0", features = ["macros"] }
tokio-util = { version = "0.7.10", features = ["io"] }
url = "2.5.0"

Building rusticl

OpenCL has been with me for more than a decade, back when we decided to use it in our research project to make it the foundation for accelerating synchrotron imaging. Now, as history has shown, OpenCL never really took off, partially because Apple (the initial sponsor) dropped it but more importantly NVIDIA being very successful in locking in people with their proprietary CUDA solution. Nevertheless, support by all major GPU vendors is there to some degree, so software can be accelerated in a somewhat portable way. The degree AMD has taken is somewhat questionable though: they do support OpenCL either via their open ROCm stack but just for select GPUs and short support windows or via their proprietary amdgpu-pro packages. The latter is what I use today to enable OpenCL in Darktable but it is a hack because it involves downloading Debian packages from their website and extracting them correctly.

Fast forward to 2022, Rust is on its way to become the premier systems language and heroes like Karol Herbst start writing OpenCL mesa drivers completely alleviating the need for the crap AMD is offering (well almost). Because building and using it is not very straightforward at the moment, here are some hints how to do that. I am assuming an older Ubuntu 20.04 box, so some things could be in the 22.04 repos already.

Installing tools and dependencies

Add the LLVM apt repos

deb http://apt.llvm.org/focal/ llvm-toolchain-focal-15 main
deb-src http://apt.llvm.org/focal/ llvm-toolchain-focal-15 main

to /etc/apt/sources.list and run apt update. Install

$ apt install clang-15 libclang-15-dev llvm-15 llvm-15-dev llvm-15-tools

Ubuntu 20.04 comes with a pretty old version of meson, so lets create a virtualenv and install it along with mako which is used by mesa itself:

$ python3 -mvenv .venv
$ source .venv/bin/activate
$ pip3 install meson mako

We also need bindgen to bind to C functions but luckily the bindgen program is sufficient and can be installed easily with

$ cargo install bindgen-cli

Build rusticl

At the moment the radeonsi changes are not yet merged into the main branch, hence

$ git remote add superhero https://gitlab.freedesktop.org/karolherbst/mesa.git
$ git fetch superhero
$ git checkout -t rusticl/si

For some reason, rusticl won’t build with the LLVM 15 libraries as is and we have to add clangSupport to src/gallium/targets/opencl/meson.build as yet another clang library to link against in order to find some RISCV symbols. It’s time to configure the build with meson

$ meson .. -Dgallium-rusticl=true -Dllvm=enabled -Drust_std=2021 -Dvalgrind=disabled

Note that meson does not check for existence for Valgrind on Ubuntu and enables it by default causing build errors when the development libraries are not installed. Time to build and install using ninja

$ ninja build && ninja install

Running OpenCL programs

I tend to install mesa into a custom prefix and pre-load it with my old shell script. In order to have the system-wide ICD loader find the ICD that points to rusticl, we have to set the OPENCL_VENDOR_PATH environment variable to the directory containing the .icd, i.e. <some-prefix>/etc/OpenCL/vendors. Also we have to set the RUSTICL_ENABLE environment variable to radeonsi because it is not enabled by default yet. With that set clinfo should show a platform with the name rusticl.

Setting up rust-analyzer

If you intend to dig into rusticl itself you will notice that this is not your bog standard Cargo project but intertwined with meson which takes care of building the majority of the C and C++ sources. Because of this rust-analyzer is not able to figure out the structure of the rusticl project. Luckily, meson 0.64 produces a rust-project.json file that describes the structure but unfortunately the paths in there seem to be a bit messed up. After symlinking from the root of the Git repo (so rust-analyzer can find it) and changing the paths to point to existing directories, rust-analyzer was able to make sense of the project.


wastebin 2: electric boogaloo

It has been almost a month already since I released the first major breaking release of my minimalist pastebin. The main reason to bump the major version was due to streamlining routes especially dropping the /api ones and adding query parameters where it made sense. In between my last post and version two, there have been many other non-breaking changes like correct caching (of course …), more keybinds, better looking user interface, minor fixes and a demo site hosted here.

Currently, I am preparing everything to make the move to the upcoming breaking 0.6 release of axum. But more importantly, I am investigating ideas how to get rid of syntect, the syntax highlighting library. My main issue with that library is that themes have to be in Sublime Text theme format which leaves a lot of nice light/dark themes on the table. My current approach is a tree-sitter based library that bundles a bunch of tree-sitter grammars and uses helix themes to highlight the parsed names. While it works alright, distributing it as a crate is a pain in the ass because only a fraction keeps publishing updated grammars on crates.io. So, next idea is perhaps bundling it via Git submodules. Let’s see.


Yet another pastebin

Pastebins are the next step in the evolution of a software developer, right after finishing hello worlds and static site generators. They are limited terms of features (or not … ahem) but require some form of dynamisms in order to receive and store user input and make it available upon request. Of course, everyone has different ideas what a pastebin should do and in what language it should be written. And because I am in no way different, I had to write my own: the wastebin pastebin that ticks the following boxes:

  • Written in Rust for ease of deployment.
  • SQLite instead of a full-fledged database server or flat files.
  • Paste expiration.
  • Minimalist appearance.
  • Syntax highlighting.
  • Line numbers.

bin – from which wastebin takes huge inspiration in terms of UI – was almost there but the lack of expiration and flat-file storage was a no-go. Moreover, I sincerely think axum has a more solid foundation than Rocket. Enough reasons to do it myself.


Serve static content with axum

One of Rust’s nice properties is producing statically linked binaries making deployment simple and straightforward. In some cases this is not enough and additional data is required for proper function, for example static data for web servers. With dependencies such as include_dir and mime_guess this is a piece of cake to integrate into axum though.

Using include_dir we first declare variable that represents the data currently located in the static directory:

use include_dir::{include_dir, Dir};

static STATIC_DIR: Dir<'_> = include_dir!("$CARGO_MANIFEST_DIR/static");

Now we define the static data route, passing *path to denote we want to match the entire remaining path.

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let app = axum::Router::new()
        .route("/static/*path", get(static_path));

    let addr = std::net::SocketAddr::from(([0, 0, 0, 0], 3000));

    axum::Server::bind(&addr)
        .serve(app.into_make_service())
        .await?;

    Ok(())
}

Note that we cannot use the newly added typed path functionality in axum-extra. Now onto the actual route handler:

async fn static_path(Path(path): Path<String>) -> impl IntoResponse {
    let path = path.trim_start_matches('/');
    let mime_type = mime_guess::from_path(path).first_or_text_plain();

    match STATIC_DIR.get_file(path) {
        None => Response::builder()
            .status(StatusCode::NOT_FOUND)
            .body(body::boxed(Empty::new()))
            .unwrap(),
        Some(file) => Response::builder()
            .status(StatusCode::OK)
            .header(
                header::CONTENT_TYPE,
                HeaderValue::from_str(mime_type.as_ref()).unwrap(),
            )
            .body(body::boxed(Full::from(file.contents())))
            .unwrap(),
    }
}

As you can see we first strip the initial slash and then use the mime_guess crate to guess a MIME type from it. If we are not able to do so, just assume text/plain however wrong that is. Then we try to locate the file path and either return a 404 or a 200 with the actual file contents. Easy as pie.