Writing a Hex Dump Utility in Rust

Trent May
6 min readNov 30, 2020

--

Rust is a fairly new programming language on the block, but it is a language that enjoys so much love in the programming community. This love has persisted the last several years, so I wanted to see what all the hype was about. In order to do that, I had to write some Rust, but I did not want to stop with the generic “Hello, world!” I had to decide on and write something that would give me a good overview of the language and let me have some fun while I was doing it.

I decided to write a utility that would allow me to output a dump of a file in hex format; this would essentially be a version of the tool known as xxd on Linux. For the sake of simplicity and this post, I only decided to imitate the basic functionality of the tool. Might save the rest for later posts :-). With that, let’s start writing some Rust and see where we can go with this language.

Setup

The first thing I need to do is download and install Rust on my local development machine. The recommended way to do this is through a tool called rustup , and the Rust website says to run the following command to do this:

curl — proto ‘=https’ — tlsv1.2 -sSf https://sh.rustup.rs | sh

I did all of this on a Linux machine, and the above command downloaded the Rust toolchain and its package management tool, Cargo, and installed it on my machine. Now that I have the compiler, I have to test it out to see if the download was successful, and the easiest way to do that was to compile the famous “Hello, world!” in Rust. Below is what that program looks like:

fn main() {
println!("Hello, world!");
}

Copy the above in a file called hello.rs . This is a Rust source file. In order to compile this to a program that executes, run the command rustc hello.rs in terminal. This may take a little time depending the setup, but the resulting executable hello will be produced in the same directory. By running the command ./hello , the text Hello, world! was printed to the terminal. Woo hoo! This confirms that our download and install was successful, so we can start out working on the hex dumper.

Cargo

Cargo is Rust’s package manager, and it really is an awesome tool. Going into it is far beyond the scope of this article, but I will be using it to manage this build and project.

To initialize a Cargo project:

cargo new rhexdump

This will create a directory called rhexdump and fill it with the necessary git repository files; this directory is referred to as the project root. All of the Rust source code will live in the rhexdump/src folder. In order to test out the program, run the command:

cargo run

Can be run inside of the terminal. Cargo will compile and run the program and log everything to the terminal. Let’s start writing some Rust!

The Big Picture

The GitHub gist above is the entirety of the program, and we will be dissecting it and stepping through the different parts and examining them.

The Breakdown

Let’s start with the main function:

let args: Vec<String> = env::args().collect();if args.len() < 2 {
panic!("Not enough arguments!");
}

Here, we start off by collecting all of the arguments passed to the program and putting them all into a vector of String types. If we did not have more than 1 argument (the program name counts as an argument, hence the 2) passed to the program, we commence to panic and cease program execution.

let mut file_to_read = get_file(String::from(&args[1]));

Next, we look to get a File structure that we can read over, and we do this by calling the get_file function that I defined above:

fn get_file(path_to_file: String) -> File {
match File::open(path_to_file) {
Ok(f) => File::from(f),
Err(e) => {
panic!(e);
}
}
}

The get_file function is pretty straightforward; it takes a String argument that represents a path to a file. The -> File indicates that this function returns a File structure.

match File::open(path_to_file) {
Ok(f) => File::from(f),
Err(e) => {
panic!(e);
}
}

Here, we perform a match operation. For readers coming from other languages, this is similar to a switch statement. The operation attempts to open a File by using the path_to_file argument provided, and this returns a Result . If the operation was successful, it returned an Ok(file_struct) , and we use the returned file_struct to create our File object. If the operation returned an Err , we panic once again.

let mut buff = [0; GLOBAL_BUFFER_LENGTH];
let mut offset: usize = 0;

In this operation, I am creating a mutable buffer that is of the GLOBAL_BUFFER_LENGTH and fill it with all 0’s. We also initialize a mutable variable called offset to 0.

loop {
let bytes_read = file_to_read.read(&mut buff);
match bytes_read => {
Ok(number) => {
if number == 0 {
break;
} else {
println!("{:08x}: {:40} {:10}",
offset,
get_hex_rep(&mut buff[0..number]),
get_ascii_representation(&mut buff[0..number]));
offset += GLOBAL_BUFFER_LENGTH;
}
},
Err(why) => {
eprintln!("rhexdump: {}", why);
break;
}
}
}

This loop is the real meat of the main function, and we’ll start off by breaking this down from the top on down.

loop {
...
}

This is Rust’s loop construct, so this block of code executes until a break occurs within it.

let bytes_read = file_to_read.read(&mut buff);

This line of code reads some data into the buffer we created further up the program, and this gets stored as a result with the number of bytes (in this case) that we read from the file.

match bytes_read {
Ok(number) => {
...
},
Err(why) => {
eprintln!("rhexdump: {}", why);
break;
}
}

This matchblock is very similar to what was done in the get_file function. If the operation returned an Ok with the number of bytes read, we continue to process the data. In the event we had an Err, we output to the standard error stream and break out of the loop.

if number == 0 {
break;
} else {
println!("{:08x}: {:40} {:10}",
offset,
get_hex_rep(&mut buff[0..number]),
get_ascii_representation(&mut buff[0..number]));
offset += GLOBAL_BUFFER_LENGTH;
}

We had an Ok returned, so we have some data that has to be processed. We first check to see if the number of bytes we read was 0. If we did not read any bytes, we are done and just break out of the loop.

If our number of bytes read was not 0, we use the println! macro to output to the screen the current offset, a hexadecimal formatted representation of our data buffer along with an ascii formatted representation of our data buffer.

First, we’ll look at theget_hex_rep function:

fn get_hex_rep(byte_array: &mut [u8]) -> String {
let build_string_vec: Vec<String> = byte_array.chunks(2)
.map(|c| {
if c.len() == 2 { format!("{:02x}{:02x}", c[0], c[1]) }
else { format!("{:02x}", c[0]) }
}).collect();
build_string_vec.join(" ")
}

This function contains some of the typical Rust functional operations with iterators and mappings. This may look foreign to some programmers, but seasoned Rustaceans will understand it.

The function starts off by stating that the processed data will be stored in a Vec<String> . We then state that we want to parse over the data in the byte_array variable in sections or chunks of 2. From there, we apply a mapping function to the various chunks. We check to see if this chunk has a length of 2, and if it does, we use the format! macro to create a String, two hexadecimal representations of the bytes contained in the chunks. If the chunk does not have a length of 2, we do the same operation to the first byte in the chunk. We then conclude by collecting all of the results into a vector. Finally, we create our returned string by calling the join function on the Vec<String> structure.

That was how we created the hexadecimal representation; we still need to create the ascii representation, and we take care of that by using the get_ascii_representation function.

pub fn get_ascii_representation(byte_array: &mut [u8]) -> String {
let build_string_vec: Vec<String> = byte_array.iter().map(|num| {
if *num >= 32 && *num <= 126 { (*num as char).to_string() }
else { '.'.to_string() }
}).collect();
build_string_vec.join("")
}

get_ascii_representation is very similar to the get_hex_rep in its layout. We begin by creating an iterator over the byte_array slice, and we apply a mapping function to its contents. Here we check to see if the byte is in the printable ascii range; if it is, we convert the byte to a character representation and then create a string out of it. If the byte is not in the printable range, we substitute it with a . string representation. We then collect all of these results into a Vec<String> . Finally, we call join on the vector to create and return our String representation.

Wrap-Up

That concludes our analysis and review of my version of the xxd Linux hex dump utility written using the Rust programming language. Rust has so many good things going for it, and I know that more and more people and organizations will be looking to utilize it in their projects.

--

--

Responses (1)