Back to Basics: File Reading with the New IO in Zig

Embracing the New Input/Output API in Zig 0.15.1

Zig 0.15.1 introduced a significant overhaul of its standard library’s IO capabilities, unveiling a powerful new set of Reader and Writer APIs. This revamp means many existing code examples and assumptions about file handling are now outdated, with some familiar functions (like readUntilDelimiterOrEof) having been removed.

It’s time to go back the basics and understand how to effectively work with files in this new landscape. This post will walk you through a practical example: reading data from a file and splitting it by a delimiter, all while exploring the essential new APIs along the way.

The Example

Let’s start by examining the example code. This program demonstrates opening a file, reading its contents, and printing segments separated by a user-defined delimiter (defaulting to a newline).

read_lines.zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.DebugAllocator(.{}){};
    defer _ = gpa.deinit();
    const alloc = gpa.allocator();

    var argv = try std.process.argsWithAllocator(alloc);
    defer argv.deinit();
    _ = argv.next();
    const filename  = if (argv.next()) |a| std.mem.sliceTo(a, 0) else "test.txt";
    const delimiter = if (argv.next()) |a| std.mem.sliceTo(a, 0) else "\n";

    // Open the file and create a buffered File.Reader.
    // Buffer size is kept small to demostrate delimiter spanning buffer boundaries.
    // In production, use a more sensible buffer size (e.g., 4KB or 8KB).
    var file = try std.fs.cwd().openFile(filename, .{ .mode = .read_only });
    defer file.close();
    var read_buf: [2]u8 = undefined;
    var file_reader: std.fs.File.Reader = file.reader(&read_buf);

    // Pointer to the std.Io.Reader interface to use the generic IO functions.
    const reader = &file_reader.interface;

    // An accumulating writer to store data read from the file.
    var line = std.Io.Writer.Allocating.init(alloc);
    defer line.deinit();

    // Main loop to read data segment by segment.
    while (true) {
        _ = reader.streamDelimiter(&line.writer, delimiter[0]) catch |err| {
            if (err == error.EndOfStream) break else return err;
        };
        _ = reader.toss(1);             // skip the delimiter byte.
        std.debug.print("{s}\n", .{ line.written() });
        line.clearRetainingCapacity();  // reset the accumulating buffer.
    }

    // Handle any remaining data after the last delimiter.
    if (line.written().len > 0) {
        std.debug.print("{s}\n", .{ line.written() });
    }
}

To run this example, save it as read_lines.zig and create a test.txt file (or specify another filename) with some lines of text:

This is a test.
Testing 1 2 3.

After a blank line.
More test

Then execute from the terminal:

zig run read_lines.zig -- test.txt

This will print each line of test.txt.

Obtaining the Reader Interface

The new IO API often requires an internal buffer for efficient buffered operations. This is evident when creating the File.Reader object:

var read_buf: [1024]u8 = undefined;
var file_reader: std.fs.File.Reader = file.reader(&read_buf);

Most high-level IO operations are performed through standard interfaces. std.fs.File.Reader is an implementation of the std.Io.Reader interface. It’s common practice to obtain a pointer to this interface:

const reader = &file_reader.interface;

Caution: It’s crucial to obtain a pointer to the interface (&file_reader.interface) rather than copying the interface object.

const reader = file_reader.interface; // DON'T DO THIS

The pointer references the inner std.Io.Reader struct within the std.fs.File.Reader implementation. Methods of the interface often need to access the “outer” parent implementation struct for context and specific details (e.g., std.fs.File.Reader.stream()). Copying the interface creates an isolated copy on the stack, losing this vital link to its parent implementation, leading to runtime errors.

Accumulating Read Data

The program uses std.Io.Writer.Allocating to dynamically accumulate data as it’s read. This is useful when processing data of unknown or varying sizes, preventing buffer overflows and simplifying memory management.

var line = std.Io.Writer.Allocating.init(alloc);
defer line.deinit();

The core reading operation happens with reader.streamDelimiter():

reader.streamDelimiter(&line.writer, delimiter[0]) catch |err| {
    if (err == error.EndOfStream) break else return err;
};

This function reads data from the reader and writes it into line.writer until it encounters the delimiter byte. Note that streamDelimiter does not include the delimiter in the written output. The delimiter byte remains in the input stream. To advance past it, call:

reader.toss(1);

The accumulated data can be retrieved as a slice using line.written():

std.debug.print("{s}\n", .{ line.written() });

Caution: line.written() returns a slice to the Allocating writer’s internal buffer. This buffer can be reallocated and moved in memory if it needs to expand. Therefore, the returned slice becomes invalid after any subsequent write operation that might trigger a reallocation.

Handling the Last Part

It’s common for files not to end with a delimiter. The while loop relying on streamDelimiter encountering a delimiter or EndOfStream will naturally exit. However, any data read after the last delimiter and before the actual end of the file will still be present in the line buffer.

    if (line.written().len > 0) {
        std.debug.print("{s}\n", .{ line.written() });
    }

This check ensures that any remaining data in the buffer after the loop finishes is also processed and printed.

To test the last part, use a custom delimiter by specifying a character as a delimiter on the command-line. To break data using a comma:

zig run read_lines.zig -- test2.txt ,

Let test2.txt contain comma-separated values:

abc,def,xyz,123,456

The program will output each field:

abc
def
xyz
123
456

This demonstrates the flexibility of streamDelimiter() for parsing various delimited data formats. After streamDelimiter() encounters the last delimiter, it continues reading until EndOfStream is hit. The final if (line.written().len > 0) check ensures the trailing segment is handled.

Memory-Efficient Zig Interface

Blog Archive

Archive of all previous blog posts