A foundation in Julia

Part one of a three-part series in which I swear I was going to keep it brief but ended up writing a staggeringly long set of notes on learning a bit of Julia
Julia
Author
Published

March 1, 2024

After many years of procrastination and telling myself I’ll get around to it later, I’ve finally decided that now is the time for me to start learning Julia. At this point in my life I am strong in R, passable in Javascript, and can survive in SQL, C++ and Python if I need them for something. But despite my interest-from-afar in Julia, I haven’t had much of an excuse to dive into it before.

Part of the appeal in Julia is that it’s designed to be a high-performance language for scientific computing. Like other scientific languages (e.g., R, Matlab, etc) it has 1-based indexing rather than 0-based indexing (Python, C++, etc). Julia code is automatically compiled giving you performance that is comparable to compiled languages like C++, without the hassle of actually having to deal with the compiler yourself. But we’ve all heard the sales pitch for Julia before, there’s no need for me to repeat it here, and anyway I kinda just want to dive into the code.

All the images in this post are screenshots from the Foundation TV show, which is very loosely based on the Isaac Asimov novels of the same name but actually remembers that women can exist in the world and do things, even in a world of fiction

Getting started

First things first. In order to get started I had to go through the process of installing Julia, which was pretty straightforward. Getting it to work within my quarto blog was a bit trickier, but there’s some fairly decent documentation on Julia for Quarto which got me there. After getting it set up it was as simple as including this line in the YAML header for this post,1

jupyter: julia-1.10

and then creating executable Julia code cells by appending {julia} after the triple-fence used to define a block. So let’s see. Is Julia working in my quarto environment? I’ll start with my usual variant on the traditional “hello world” program using the println() (i.e., “print line”) function:

println("hello cruel world")
hello cruel world

Yes, that seems to be working, as – shockingly – is the ability to do some basic calculations using aritmetic operators that seem pretty much the same as most languages I use:

24 * 7
168

I can define variables, using = as the assignment operator:

hours = 24;
days = 7;

The semicolons here are optional: they’re used as end-of-line delimiters, but the main reason I’ve used them in the code chunk above is to suppress printing the return value of these assignments.

So yes, we are up and running.

Object types

Julia is a dynamically typed language, so when I defined the hours variable earlier I was able to create an integer without explicitly defining it as such:

typeof(hours)
Int64

By default Julia creates a 64-bit integer, but – unlike R and more like C++ – there are several integer types. If I’d wanted to create a 128-bit integer to represent the number of minutes in an hour (but why????) I could have done so by declaring the type explicitly:

minutes::Int128 = 60;
typeof(minutes)
Int128

So while minutes and hours are both integers they are different types, and – as you would expect – are represented differently internally. In an extremely strict language, it would not be possible to multiple minutes by hours without first converting at least one of them to a different type, but thankfully Julia operators will automatically promote to common type and so I can calculate the number of minutes in one day without doing the tedious type conversions myself:

typeof(minutes * hours)
Int128

You can see the same mechanism in action when I try to calculate the number of minutes in 1.7 days. The minutes variable2 is a 64-bit integer, the hours variable is a 128-bit integer, but the value of 1.7 is represented as a 64-bit floating point numbers. So when I compute minutes * hours * 1.7, the return value is a 64-bit float:

typeof(minutes * hours * 1.7)
Float64

The big floating Vault thingy. Much more dramatic than the book version, I guess, but at the same time I kind of feel they use it as a Seldon Ex Machina a bit too much. Not my favourite innovation in the show

Vectors

I find myself liking the syntax Julia uses to create objects. You can create a vector using square brackets like this, which feels very much like Matlab to me:3

words = ["hello", "cruel", "world"];

The words variable I’ve just created is a vector of three strings:

typeof(words)
Vector{String} (alias for Array{String, 1})

Subsetting uses square brackets too, and as I mentioned earlier indexing in Julia starts at 1:

words[1]
"hello"

A couple of other things to note here. In Julia, you need to be more careful about single versus double quotes than you would be in R (where they are interchangeable). In Julia, single quotes are used to define a single character (e.g., 'h' is a character), whereas double quotes are used to define a string (e.g. "hello" is a string). Strings are in fact a vector of characters, so "hello"[1] returns the character 'h'. But whatevs. Let’s move along.

Tuples

I have no intention of diving too deeply into object types in Julia, but there are two more that I feel are worth mentioning at this point: tuples and dictionaries. Let’s start with tuples. A tuple is simply an ordered collection of values, and are constructed using parentheses:4

fruit = ("apple", "banana", "cherry")
("apple", "banana", "cherry")

Dictionaries

In contrast, a dictionary5 is a list of key-value pairs. There’s a few different ways to define a dictionary but I’m partial to this syntax:

danielle = Dict(
  "name" => "danielle",
  "age" => 47,
  "gender" => "female",
  "boring" => true
)
Dict{String, Any} with 4 entries:
  "name"   => "danielle"
  "boring" => true
  "gender" => "female"
  "age"    => 47

The entries in a dictionary can be indexed using the keys:

danielle["gender"]
"female"

Not even remotely a storyline in the books, obviously. I mean, the Cleonic dynasty was not a thing in the books, but the whole arc involves women playing politics, and also having sex sometimes

Functions

The syntax for defining functions in Julia comes in a couple of forms. The usual way to do it is using the function keyword, and I could define a simple greet() function like this:

function greet(name) 
  "hello $name, nice to meet you"
end;

The end keyword is required here. Note also that I’ve taken advantage of Julia’s string interpolation syntax to substitute the value of name into the string that greet() outputs:

greet("danielle")
"hello danielle, nice to meet you"

You can also create functions using the anonymous function syntax (e.g., x -> "hello $x" defines an anonymous function), which is handy in the functional programming context if you want to map a vector of values onto another vector using map():

map(x -> "hello $x", ["amy", "belle", "chiara"])
3-element Vector{String}:
 "hello amy"
 "hello belle"
 "hello chiara"

In this case though I didn’t really need to resort to using map() because Julia also allows you to vectorise a function, using . to “broadcast” a scalar function to accept vector inputs:

greet.(["amy", "belle", "chiara"])
3-element Vector{String}:
 "hello amy, nice to meet you"
 "hello belle, nice to meet you"
 "hello chiara, nice to meet you"

I can see that being handy.

I’ll come back to functions momentarily in order to talk about generic functions and method dispatch in Julia, but first I’ll pivot a little to talk about packages.

Packages

As with any programming language, most of the power comes in Julia comes from the extensive collection of packages that other users have contributed. The usual way to install a package is via the Julia REPL.6 The Julia REPL is a little unusual in that it has several different “modes”. Normally your command prompt in the Julia REPL looks something like this:

julia>

But if you type ] at the REPL you’ll see it transform into something like this:7

(@v1.10) pkg>

This tells you that you’ve entered “package” mode, and you can type commands that can be used to install Julia packages and various other things.8 9 (If you want to get out of package mode and return to the regular REPL press “backspace”.)

So then, if you want to install the JSON package, the command you’d type at the REPL in package mode would simply be add JSON. And having installed the JSON package into my Julia environment, I can load it using the using keyword:

using JSON

And now I can read the “praise.json” file that I just so happen to have sitting in my working directory by calling JSON.parsefile()

praise_dict = JSON.parsefile("praise.json")
Dict{String, Any} with 3 entries:
  "exclamation" => Any["ah", "aha", "ahh", "ahhh", "aw", "aww", "awww", "aye", …
  "superlative" => Any["ace", "amazing", "astonishing", "astounding", "awe-insp…
  "adverb"      => Any["beautifully", "bravely", "brightly", "calmly", "careful…

Most convenient, because now that I have this praise_dict object I’m going to use it in the next section when I return to talking about functions…

Probably a smart move by the showrunners to use pretty pictures and cool tech as a way of representing a mathematical discipline like psychohistory, but… also it’s very silly

Methods

One of my favourite little R packages is praise, which you can use to create random snippets of positive feedback that can be inserted in various places. Inspired by this, I’m going to define a cute little praise() function that does something similar.

In the last section I defined praise_dict, a handy dictionary that contains some adverbs, superlatives, and exclamations that you can use to construct random praise statements. So let’s define praise() such that it takes the name of a person as a string, and outputs a piece of positive feedback:

function praise(name::String)
    hey = rand(praise_dict["exclamation"])
    sup = rand(praise_dict["superlative"])
    adv = rand(praise_dict["adverb"])
    "$hey $name you are $adv $sup"
end;

praise("danielle")
"wow danielle you are wisely wicked"

Oh, that’s so sweet of you to say. Notice, however, that I’ve been a little stricter in how I’ve defined the input arguments for praise() than I was earlier when I defined greet(). The praise() function won’t work if the name argument is not a string:

praise(103)
MethodError: no method matching praise(::Int64)

Closest candidates are:
  praise(::String)
   @ Main In[22]:1

That’s probably a good thing. We don’t typically provide praise to an integer, so it makes sense that the function doesn’t work when you pass it an integer!

On the other hand, we might want our praise() function to work if the user doesn’t pass it a name at all. To accomplish that, we can write another praise() function that doesn’t take any arguments:

function praise()
    hey = rand(praise_dict["exclamation"])
    sup = rand(praise_dict["superlative"])
    adv = rand(praise_dict["adverb"])
    "$hey you are $adv $sup"
end;

So now this works:

praise()
"mmh you are bravely good"

The key thing to notice here is that though I’ve defined praise() twice, what Julia actually does in this situation is construct a single “generic” function that has two methods. In other words, praise() will work if you pass it a single string, and it will also work if you don’t pass it any arguments at all. It won’t work for any other kind of input. On the surface that seems pretty sensible, but in practice we might need a third method. Suppose I have a vector where there are a few people’s names listed, but it has missing values:10

people = ["alex", missing, "fiona"];

My praise() function isn’t inherently vectorised, but of course we can use the . syntax to praise several people at once and call praise.(people). Unfortunately this work right now because praise() doesn’t know what to do with the missing value. So if we want our praise() function to handle missing data gracefully it needs a third method:

function praise(name::Missing)
    hey = rand(praise_dict["exclamation"])
    sup = rand(praise_dict["superlative"])
    adv = rand(praise_dict["adverb"])
    "$hey you are $adv $sup"
end;

Now that we have all three methods praise() works just fine:

praise.(people)
3-element Vector{String}:
 "mmhm alex you are correctly hunky-dory"
 "yeah you are truthfully primo"
 "mhm fiona you are neatly lovely"

As an aside, if you ever needed to find out what methods have been defined for the praise() function, you can do so by calling methods(praise).

The jump ships are just cool. No further comment needed

Piping

Much like recent versions of R, Julia comes with a piping operator |> that you can use to pass the output of one function to another one. So let’s say I have some numbers stored as vals, and I want to compute their geometric mean:

vals = [10.2, 12.1, 14.3]
3-element Vector{Float64}:
 10.2
 12.1
 14.3

In Julia mean() is part of the Statistics package, so we’ll load that:

using Statistics

To compute the geometric mean, we first compute the natural logarithm for each element in vals using log(), compute the arithmetic mean of those log-values with mean(), and then exponentiate the result with exp(). Written as a series of nested function calls, it looks like this:

exp(mean(log.(vals)))
12.084829472557535

As has been noted many times in the past, one awkward feature of code written in this form is that you have to read it from the inside (innermost parentheses) to the outside in order to understand the sequence of events: first you take vals and pass it to log.(), then you take these logarithms and pass them to mean(), and then you take this mean and pass it to exp(). In this specific case it’s not terrible to read, because it just so happens that “exp mean log value” is pretty much how the formula for the geometric mean is written mathematically, but most data oriented programming isn’t structured to look exactly like an equation, and “inside out” code quickly becomes difficult to read.

This is where the “pipe” operator |> comes in handy. You start with an object on the left hand side, and then pass it to the function named on the right hand side. When you chain a series of piping operations together you end up with code that reads left-to-right rather than inside-out:

vals .|> log |> mean |> exp
12.084829472557535

Notice that like other operators, I can use . to broadcast when using the pipe.

Much like R, Julia has multiple versions of the pipe. For the purpose of this post I’m only going to talk about the base pipe, which is much much stricter than the magrittr pipe %>% in R, and indeed considerably stricter than the base pipe |> in R.11 As you can see from the code above, the right hand side of the pipe is a function, not a call. The object supplied on the left hand side of the pipe is passed as the first argument to the function. No additional arguments can be supplied to the function on the right.

On the surface this seems very restrictive, but the longer I’ve been playing with Julia the more I realise it’s not as restrictive as I first thought. Because Julia makes it very easy to write anonymous functions, and because there’s very little overhead to calling one, you can write a pipeline that consists entirely of calls to anonymous functions. As a very simple example of a “split, apply, combine” style workflow constructed with the Julia pipe, here’s how you could use this to reverse each of the individual words in a string:

"hello cruel world"  |>
  x -> split(x, " ") |>
  x -> reverse.(x)   |>
  x -> join(x, " ")
"olleh leurc dlrow"

To make this work I really do need to be able to specify additional arguments to split() and join(), which would not be permitted in a simpler application of the Julia pipe, but it works perfectly well here because those additional arguments are specified inside the anonymous functions to which the inputs are piped.

Honestly, as much as I was initially like “ugh this is unwieldy”, I’m starting to appreciate the simplicity of the design and how it really does force you to start thinking about your pipelines in functional programming terms.12

Function composition

I should also mention that Julia has the function composition operator that you can use for this purpose, using much the same notation as in mathematics.13 So I could define a geomean() function as the following composition:

geomean = exp  mean  (x -> log.(x))
geomean(vals)
12.084829472557535

In this expression I’ve used an anonymous function as the third function to be composed so as to ensure that if the user passes a vector such as vals, the default behaviour of geomean() is to broadcast the call top log() (i.e., compute the log of each input individually), then pass the resulting vector of logarithms to mean() as a vector, and then pass the resulting scalar to exp().

To be honest, as cute as this is, I’m not sure I see much utility to this right now. So yeah, time to move onto the last “topic” in this foundations post, in which the author will mention but in no way actually explain the extensive capabilities that Julia has for allowing…

Gender-swapping Daneel to Demerzel was a good move in the show just for the sake of helping to mitigate the sausage-fest that the novels presented, but also I kind of think Demerzels story is just more interesting

Metaprogramming

Much like R – and very unlike Matlab, which Julia syntax sometimes resembles – the design choices underpinning Julia has been influnced heavily by Lisp. While I have never actually learned to program in any of the major dialects of Lisp, I’ve always wanted to, and I’m a huge fan of the way that Lisp and its descendants contain programming constructs that directly represent abstract syntax trees and provide tools that let you manipulate user-supplied code.14

Because Julia Metaprogramming is such a powerful tool, what I’ve noticed already – even as a novice – is that most practical uses of the language end up relying on it heavily. Julia supports abstract Symbols, Expressions, and Macros, all of which start to pop up in your code once you start using it for real world data wrangling and visualisation. So it’s pretty important to understand something about how it all works. That said… it’s an advanced topic rather than a basic one, so what I think I’m going to do for now is issue a promissory note: I’ll talk more about this in later posts as those topics become relevant.

Where to next?

Very obviously, I skipped a lot of foundational topics in this post. This is not in any sense a coherent or complete introduction to Julia programming. I mean, I didn’t even bother to talk about control flow, which is one hell of an omission. But my goal here isn’t to provide a complete overview, and perhaps surprisingly I don’t actually use loops or if/then conditionals at all in the next two posts, so I simply haven’t bothered to write anything about those topics here. I focused on the things that popped up as I went about trying to try out a few things.

In any case, if you’re curious about where this is about to go, the second post in this series will talk about data frames and data wrangling, while the third post will take a look at a data visualisation tool. Which, quite frankly, is a lot more than I’d intended to do when I had this idea – which increasingly feels ill-advised – to play around with Julia and write about the experience.

Footnotes

  1. Okay that’s only half true. The other thing I ended up doing was creating a project environment for this blog post, and if you look at the source code for this post you can see that I’ve actually used Pkg.activate() to ensure that the Julia code in this post is executed using that environment. There’s a nice blog post on setting up a project environment here, but it’s a bit beyond the scope of what I want to cover here.↩︎

  2. Technically, the value referenced by the minutes variable: values have types, variables are simply labels that point to values. But I shan’t be bothered with that distinction here.↩︎

  3. Fun fact. Apart from a brief period in undergraduate where I learned a little bit of C, Matlab was my first programming language. But it’s been a very, very long time since I used Matlab – or GNU Octave – for anything. I imagine I could pick it up again if I had to but I honestly don’t see the point.↩︎

  4. A tuple is an immutable type, so the idea here is that you’re really supposed to use tuples to represent list of values that doesn’t change.↩︎

  5. Dictionaries are mutable, so you can modify values stored in a dictionary.↩︎

  6. REPL = “Read-evaluate-print loop”. It’s a fancy way of referring to the command line I guess. In R we’d usually refer to the REPL as the R console, but other languages tend to use the term REPL.↩︎

  7. There are other modes besides regular and package. For instance if you type ? at the REPL it takes you into help mode.↩︎

  8. You don’t actually have to do it this way. The “package” mode in the REPL exposes various functions from the Pkg package, so if you have loaded Pkg then you could totally call Pkg.add() to install a package. In practice I find this a bit silly, but I suppose it has more useful applications in activating an environment via Pkg.activate() etc.↩︎

  9. The syntax here is meaningful. If you are working in the base Julia environment, the bit in parentheses tells you that if you add a package it will be added to the base environment. For this blog post, however I’m using a custom environment called “sandbox” that has the packages I’m using, so the prompt I would see looks like this: (sandbox) pkg>.↩︎

  10. The missing value is used to define missing data in Julia, analogous to how NA is used to define missing values in R.↩︎

  11. If you are interested in such things, the Pipe package supplies a pipe that is very similar to the R base pipe.↩︎

  12. So much so, in fact, that while my original plan for these Julia posts was to briefly dispense of the base pipe and spend more time talking about the Pipe package, I think I’m going to skip the package entirely and just use base pipe + anonymous functions. ↩︎

  13. For most editors that are configured to handle Julia syntax can type the operator by typing \circ and then hitting tab.↩︎

  14. Very often I see programmers who have never worked in a Lisp-descended language (e.g., they know Python, C, etc. but not R, Julia, Scheme, etc.) react in horror and outrage at the things that you are permitted to do in languages that rely extensively on metaprogramming, but honestly I love it. I think it’s such a powerful tool for constructing domain specific languages within the confines of a more general language.↩︎

Reuse

Citation

BibTeX citation:
@online{navarro2024,
  author = {Navarro, Danielle},
  title = {A Foundation in {Julia}},
  date = {2024-03-01},
  url = {https://blog.djnavarro.net/posts/2024-03-01_julia-foundation},
  langid = {en}
}
For attribution, please cite this work as:
Navarro, Danielle. 2024. “A Foundation in Julia.” March 1, 2024. https://blog.djnavarro.net/posts/2024-03-01_julia-foundation.