By Daniel Hines
Esy is an indepensible part of our workflow at Marigold. Its Nix-inspired sandboxed builds give us a lot of flexibility in sourcing and building and our codebases, which is essential as we utilize the yet-to-be-released OCaml multicore and package things for more exotic environments Windows and OSX.
My experience with has been that Esy really is the easiest, most “works-out-of-the-box” way to get started with an OCaml project. You’re not likely to run into difficult-to-debug issues if you stick to the main path, which has been well-polished by the Esy maintainers.
Esy provides a subcommand esy x for executing scripts in the sandboxed environment. esy x will automatically rebuild environment when the source code has changed, making it a great way to quickly iterate while testing out commands. However, the analysis of the source adds some overhead - most of the time it’s not even noticable, but in one of our development scripts where we ran esy x over and over on an unchanged directory, it was adding up.
What if, instead of running esy x our_command, we simply got the path: our_command=$(esy x which our_command)? This indeed made our scripts faster, but sometimes it would fail with this error:
```
error while loading shared libraries: libev.so.4: cannot open shared object file: No such file or directory
```
When I first saw the error, I didn’t know where it was coming from or how to begin troubleshooting it. Yet with some help from my coworkers, we solved it quite easily. Along the way, I learned a few tricks about Esy, the GCC toolchain, and how to debug issues with native code. In this article, I’ll share them with you in a simplified Esy project.
Because the topics of GCC, Linux binaries, etc. are quite deep, I cannot cover them here with the detail they deserve. Instead, I will link to good entrypoints into these topics and show how they applied to a particular OCaml toolchain issue.
Hello World in Dream
Let’s create a simple project using the Dream web framework.You can follow along here, or see the the code on Github.We’ll implement the first commit here.
Learning how to create a new Esy project is a valuable thing in itself, but it can be tricky if you’re new to it, so we’ll review the process here.
We first need a package.json file that Esy will read to fetch our dependencies. Additionally, we tell Esy how to build the package with Dune.
```json
{
"name": "hello",
"esy": {
"build": "dune build -p #{self.name}"
},
"dependencies": {
"@opam/dream": "*"
}
}
```
We’ll also need a top level dune file specifying how dune should build the project.
```
(executable
(name hello)
(public_name hello)
(libraries dream))
```
Dune requires at least a single empty opam file, so create one.[^1]!
```bash
touch ./hello.opam
```
Now we can create `hello.ml` :
```ocaml
let () =
Dream.run (fun _ ->
Dream.html "Hello world!\n")
```
Install our dependencies, build the project.
At last, we can build our executable and run it with `esy x`
```bash
> esy x hello
11.02.22 15:11:30.831 Running at http://localhost:8080
11.02.22 15:11:30.831 Type Ctrl+C to stop
```
If you’re new to the OCaml stack, all this can be a bit overwhelming, The relationship between Esy, Dune and OPAM is not always clear. For this reason, I highly recommend the reading getting started tutorial, which explains Esy’s sandboxing process.
The Trouble Begins
Suppose we want to automate the use of our new web server in some way. In my case, I was working on the Deku sidechain, which has a simple shell script to automate bringing up a cluster of nodes running on different ports for development.
The `esy x` command incurs some overhead, since it first checks the source tree and builds a new environment if there are any changes. On my machine, the overhead was about 300 to 400 miliseconds - not noticable if if you’re only running it once, but when you’re running a script many times with many invocations of esy x ...`, it starts to become noticable. To mitigate this overhead, our script builds the project once and then just executes the resultant binary multiple times. You can get the path of our webserver binary like this:
```bash
> esy x which hello
```
However, if you try to execute this binary, you'll get our shared library error:
```bash
> $(esy x which hello)
/home/d4hines/repos/hello-dream/_esy/default/store/i/hello_dream-f5ee1aa3/bin/hello: error while loading shared libraries: libev.so.4: cannot open shared object file: No such file or directory
```
This kind of error hints that something can't find a C library it expects. To solve it, we'll need to understand a bit about how binaries work on Linux.
Every tool we'll need to debug this issue is provided by the `coreutils` package, which is included in pretty much every Linux distro by default. Let's use the `file` command to get some basic info about our executable:
```bash
> file $(esy x which hello)
...
```
You’ll see the file is a symbolic link. It’s actually a link to a link, so let’s follow it recursively:
```bash
> file $(readlink -f $(esy x which hello))
/home/d4hines/repos/hello-dream/_esy/default/store/b/debugging_gcc_issues-f5ee1aa3/default/hello.exe: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=9892683e7f69fa536d97d39f6b83e5e68a2576b6, for GNU/Linux 4.4.0, with debug_info, not stripped
```
We can see that the file is an executable structured in the the Executable and Linkable Format (ELF). ELF is a format for providing the kernel with hints about how a binary should be executed, and is the standard format used many Unix-like systems.
The details of the ELF format are beyond what I can cover here - I recommend this brief guide for an overview of the structure. Additionally, this guide covers many of the commands we’ll use today in more detail.
Another thing we can see is that the executable is dynamically linked. This means that it references library code that is not included in the binary itself. Instead, per the ELF format, the binary includes references to shared object files that provide the library code. To execute the program, an interpreter must be used that sets up the environment of the executable before running it (you can see from the output this is `/lib64/ld-linux-x86-64.so.2,` in our case). To succeed, this interpreter must know where to find the shared object file (usually ending in `.so`).
The Missing Library
The `coreutils` package provides many tools for analyzing and manipulating ELF files.
In the ELF format, all information needed for dynamic linking is found in the `.dynamic` section. We can read this section with the `readelf -d` command:
```
> readelf -d $(esy x which hello)
Dynamic section at offset 0x507d68 contains 33 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libssl.so.1.1]
0x0000000000000001 (NEEDED) Shared library: [libcrypto.so.1.1]
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]
0x0000000000000001 (NEEDED) Shared library: [libev.so.4]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libdl.so.2]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x000000000000000c (INIT) 0x247000
0x000000000000000d (FINI) 0x46e6f0
0x0000000000000019 (INIT_ARRAY) 0x508cb0
0x000000000000001b (INIT_ARRAYSZ) 8 (bytes)
0x000000000000001a (FINI_ARRAY) 0x508cb8
0x000000000000001c (FINI_ARRAYSZ) 8 (bytes)
0x000000006ffffef5 (GNU_HASH) 0x3c0
0x0000000000000005 (STRTAB) 0x89490
0x0000000000000006 (SYMTAB) 0x24af8
0x000000000000000a (STRSZ) 501063 (bytes)
0x000000000000000b (SYMENT) 24 (bytes)
0x0000000000000015 (DEBUG) 0x0
0x0000000000000003 (PLTGOT) 0x509000
0x0000000000000002 (PLTRELSZ) 8736 (bytes)
0x0000000000000014 (PLTREL) RELA
0x0000000000000017 (JMPREL) 0x244ae8
0x0000000000000007 (RELA) 0x10c200
0x0000000000000008 (RELASZ) 1280232 (bytes)
0x0000000000000009 (RELAENT) 24 (bytes)
0x000000006ffffffb (FLAGS_1) Flags: PIE
0x000000006ffffffe (VERNEED) 0x10c000
0x000000006fffffff (VERNEEDNUM) 6
0x000000006ffffff0 (VERSYM) 0x1039d8
0x000000006ffffff9 (RELACOUNT) 53296
0x0000000000000000 (NULL) 0x0
```
With this command we can see which shared libraries are required by our executable. Using the `ldd` command, we can see where the interpreter will look for them:
```
> ldd $(esy x which hello)
linux-vdso.so.1 (0x00007ffd47191000)
libssl.so.1.1 => /usr/lib/libssl.so.1.1 (0x00007f0e3d836000)
libcrypto.so.1.1 => /usr/lib/libcrypto.so.1.1 (0x00007f0e3d555000)
librt.so.1 => /usr/lib/librt.so.1 (0x00007f0e3d54a000)
libev.so.4 => not found
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f0e3d529000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007f0e3d3e5000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f0e3d3dc000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f0e3d210000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f0e3e0dd000)
```
We've now unraveled one layer of our initial error: it seems that system is able to locate all the necessary shared libraries except for `libev.so.4`, which is `not found`, leading to our error.
The Linker and the Esy Sandbox
By default, the linker determines the paths to shared libraries using the file `/etc/ld.so.cache`. We can view these paths using the command `ldconfig` command:
```
> ldconfig -p
...
```
This will list all the system libraries. You can add to this list by installing libraries with your system package manager; however, you can only install one version at a time. This is not convenient for development, where you may want to experiment with many versions a library in different packages (even different versions of the same package).
The linker allows you to specify extra paths to search with the `LD_LIBRARY_PATH` environment variable. Esy uses this variable in its sandboxes to allow linking to Esy-controlled dependencies, making Esy builds reproducible and allowing multiple versions of a library to coexist on a single machine.
We can see this at work by entering the Esy sandbox with `esy shell`:
```
> esy shell
> echo $LD_LIBRARY_PATH
/home/d4hines/.esy/3_________________________________________________________________/i/esy_libev-4.33.1-c8cb0882/lib:/home/d4hines/.esy/3_________________________________________________________________/i/esy_openssl-0ec6341a/lib:
```
While in the esy shell, `ldd` can use this variable to locate `libev` correctly:
```
> ldd $(esy x which hello)
linux-vdso.so.1 (0x00007ffcd6186000)
libssl.so.1.1 => /home/d4hines/.esy/3_________________________________________________________________/i/esy_openssl-0ec6341a/lib/libssl.so.1.1 (0x00007f22b002d000)
libcrypto.so.1.1 => /home/d4hines/.esy/3_________________________________________________________________/i/esy_openssl-0ec6341a/lib/libcrypto.so.1.1 (0x00007f22afd43000)
librt.so.1 => /usr/lib/librt.so.1 (0x00007f22afd1c000)
libev.so.4 => /home/d4hines/.esy/3_________________________________________________________________/i/esy_libev-4.33.1-c8cb0882/lib/libev.so.4 (0x00007f22afd0a000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f22afce9000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007f22afba3000)
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f22afb9c000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f22af9d0000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f22b08bc000)
```
You can see that several libraries are linked to the Esy sandbox versions instead of my system-installed versions. This could have caused problems if there two versions were incompatible.
The Quick Fix
For the purposes of our automation script, all we needed to do was set the `LD_LIBRARY_PATH` before running the executable:
```
> export LD_LIBRARY_PATH=$(esy x sh -c 'echo $LD_LIBRARY_PATH')
> $(esy x which hello)
23.02.22 14:39:58.163 Running at http://localhost:8080
23.02.22 14:39:58.163 Type Ctrl+C to stop
```
Our web server works again[^2]!
A More Robust Fix
The above fix only works because we used Esy to install `libev` locally, and thus could link to it. But this requires that users have Esy installed and go through the process of installing the projects dependencies, which is a hassle.
It would be nice to be able to give them a single binary that works out-of-the-box.
One solution is to just list `libev` as a required system dependency. Thus, users must first run `sudo apt install libev`, `sudo pacman -S libev`, or similar before the executable will run. In this case, the linker would look for `libev` in the usual places and find it.
But it would be even nicer if our binary had no runtime dependencies beyond the basics provided by Linux. We can achieve this through the process of static linking. Static linking is analogous to what Webpack and Rollup do in the frontend world:the program and all its required libraries are bundled together into a single file.
However, statically linking Ocaml code with Esy is a bit involved. We’ll leave the process for a follow-up post.
conclusion
We used `coreutils` and Esy to successfully debug our native code issue. However, we've only scratched the surface of both of these toolchains. Native programming is full of "wtf??" moments (let's face it - all programming is like this), but it genuinely gets better after a while.
My hope is that this article gave you some hints to get started debugging issues you encounter in the wild, as well as some references to further explore.
Have fun compiling!
If you want to know more about Marigold, please follow us on social media (Twitter, Reddit, Linkedin)!
- The empty *.opam file is needed for legacy reasons; however, it's possible for Dune to generate the correct .opam file for you. See the Dune docs.
- I spoke to an Esy maintainer about our quick fix, and their stance was that while this fix worked in our case, it may not work in every case. esy x is slower but safer in this respect.