Dainius Masiliūnas, Loïc Dutrieux, Jan Verbesselt, Johannes Eberenz

2025-08-18

WUR Geoscripting WUR logo

Learning objectives

  • Understand the structure of a package in R and Python

What are packages?

Packages extend the functionality of a programming language by providing new functions (and/or datasets). There is a huge variety of packages in both R and Python. But how are packages made?

In essence, packages are nothing more than bundles of scripts! Typically, these scripts provide functions: small pieces of scripts that do one particular task, typically with a name and input arguments. By loading a package, we simply place these functions in what is called the global environment: a part of your computer memory that holds your variables, function definitions, datasets etc. for the currently running session of R or Python.

Packages are formalised by a set of rules that all developers have to abide by. One important rule is that the action of merely loading a package should not cause harm to the user (or their environment): functions can be loaded into the global environment, but should not overwrite what is already there, or run any code (which could potentially delete files etc.). Therefore, packages typically consist of script files with nothing other than function definitions. Running those functions is left up to the users in their own scripts. How to run them is documented using examples, either in comments next to the functions, or in an external help file, but never directly as executable code.

In addition to the scripts (and their documentation), packages typically include some metadata, such as the name of the author, package version, what other packages are needed to run this package, etc. In R, this metadata is mandatory, whereas in Python it’s to some extent optional.

Packages also typically follow a particular structure, i.e. where files are located and how. Let’s take a closer look at the package structure in both Python and R.

Package structure

In Python

A simple package (module) in Python is simply a directory with a Python file in it. In Colab, this code will create a directory and a Python file containing a Hello World example:

! mkdir -p MyPackage
! echo "def helloworld():" > MyPackage/helloworld.py
! echo "    print('hello world')" >> MyPackage/helloworld.py

Now we can directly make use of our new helloworld() function in Python:

import MyPackage.helloworld as mphw

mphw.helloworld()

In the import name, MyPackage is the name of the directory, and helloworld is the name of the file. We rename MyPackage.helloworld to mphw for brevity, but it’s optional.

Well, that was easy!

Though our package is technically not a complete package yet, as it can only be run if we copy the file to every code project that wants to use our function. A true package can be installed and then run from any place on your computer. Thankfully that’s also easy to do! All we need is to provide a bit more structure and metadata.

The metadata file in Python is called pyproject.toml, and it has to be in a subdirectory with the name of the package, as the name will appear in the list of packages. The actual functions need to be in a subdirectory of that subdirectory, with the name of the package, as the package will appear in Python. Typically, both of these names should be the same, otherwise it will result in user confusion. Lastly, if we want to be able to just import the package to get our functions, without having to specify the name of the file that holds the functions we want to import, we should call the script with our functions __init__.py.

In summary, the package structure looks like this:

MyPackage
├── MyPackage
│   └── __init__.py
└── pyproject.toml

Let’s make our files match this structure! First, make the subdirectory MyPackage/MyPackage, and then rename our helloworld.py to MyPackage/MyPackage/__init__.py:

! mkdir MyPackage/MyPackage
! mv MyPackage/helloworld.py MyPackage/MyPackage/__init__.py

Next, let’s add the project metadata. We only need to specify the name of the package and its version. To prevent confusion, use the same name as the name of the two directories.

! echo '[project]' > MyPackage/pyproject.toml
! echo 'name = "MyPackage"' >> MyPackage/pyproject.toml
! echo 'version = "0.1"' >> MyPackage/pyproject.toml

Bam, we now have a true package! We can install it with pip:

! pip install ./MyPackage

And now we can use our function from any directory simply as:

import MyPackage

MyPackage.helloworld()

In R

Packages in R are also nothing more than an organised collection of R scripts. Compared to Python, there are a few more files necessary to make an R package, but thankfully, we also have tools that help to create one!

Once again, let’s make a Hello World function and put it into a file:

echo 'HelloWorld <- function() print("Hello world")' > HelloWorld.r

Let’s make it into a package! We use the function package.skeleton() to autocreate a package structure:

package.skeleton(name = "MyPackage", code_files = "HelloWorld.r")
## Creating directories ...
## Creating DESCRIPTION ...
## Creating NAMESPACE ...
## Creating Read-and-delete-me ...
## Copying code files ...
## Making help files ...
## Done.
## Further steps are described in './MyPackage/Read-and-delete-me'.

Let’s follow the instructions to read the file MyPackage/Read-and-delete-me. And then delete it:

rm MyPackage/Read-and-delete-me

Let’s see what is the R package structure:

tree -n MyPackage
## MyPackage
## ├── DESCRIPTION
## ├── NAMESPACE
## ├── R
## │   └── HelloWorld.r
## └── man
##     ├── HelloWorld.Rd
##     └── MyPackage-package.Rd
## 
## 3 directories, 5 files

It’s a little bit more complicated than Python, but not by much. The script files are stored in a directory called R (as R packages can also include C++ code etc.), and our script file is already there. Next we have the DESCRIPTION file, which, like the pyproject.toml file we saw for Python, includes metadata about the package. You can open it and edit it. Lastly, we have the NAMESPACE file, which states which functions will be accessible to package users (as opposed to only internal to the package), and a man directory, which includes the documentation manual entries. We have two entries: HelloWorld.Rd, which is the description of our HelloWorld function, and MyPackage-package.Rd, which is a description of the package itself. These documentation files are written in an R variant of LaTeX.

Let’s install our package! First we need to build it, from the terminal:

R CMD build MyPackage
## * checking for file ‘MyPackage/DESCRIPTION’ ... OK
## * preparing ‘MyPackage’:
## * checking DESCRIPTION meta-information ... OK
## * installing the package to process help pages
## * saving partial Rd database
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## * building ‘MyPackage_1.0.tar.gz’

As you can see, the package is just a zip of our package files, plus some metadata cache. Now we can use R itself to install our new package:

install.packages("./MyPackage_1.0.tar.gz")
## Installing package into '/home/dainius/R/x86_64-suse-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
## inferring 'repos = NULL' from 'pkgs'

Perfect, we can immediately load it and use it:

library(MyPackage)

HelloWorld()
## [1] "Hello world"

As a bonus, we can also see the documentation we had in the man files:

?HelloWorld

That was also pretty easy!

To summarise, we have learned that packages in both Python and R are nothing more than scripts that contain functions, in a particular directory structure. If we start developing our projects with that structure in mind, we can make packages out of our scripts very easily!

Tip: Once you have a package, you can also have it uploaded to a package repository, so that others (or you from another computer) can easily download and run the code in the package. However, it does require you to have your code in good working order and be well documented, since, after all, it’s going to be public for everyone to see!

The package repository for Python is called PyPI, The Python Package Index. PyPI is developer-friendly, which means that you can easily upload your packages on PyPI without much hassle, as long as your project is in a reasonable shape and declares all its dependencies properly. See the PyPI documentation to learn more about how to submit packages to PyPI.

The package repository for R is called CRAN, the Comprehensive R Archive Network. Unlike PyPI, CRAN is end-user-friendly. That means that submitting packages to CRAN is not nearly as easy, because it has much higher quality requirements for any (new) packages. All packages submitted to CRAN have to pass through both automatic tests (using R CMD check command), as well as human review. Your package will be rejected if it depends on a package version that is not currently on CRAN, if you fail to document any function parameter, make any typo anywhere or even fail to use quotes correctly. Therefore, the package submission process for CRAN is more akin to peer review for publishing a scientific paper. However, it ensures that packages always work together with other packages, which immensely simplifies package dependency management and allows users to simply use install.packages() and expect the new package to just work. Packages are also regularly removed from CRAN if they stop working due to updates in their dependencies, making sure that all packages on CRAN stay compatible with each other. See CRAN documentation for more information about package submission requirements.

Project structure

Making packages is a great way to stay organized, keep track of what you are doing and be able to use it quickly and properly at any time. Packages are:

  • Easy to share with others
  • Dependencies are automatically imported and functions are sourced (reduces the risk of having broken dependencies)
  • Documentation is attached to the functions and cannot be lost (or forgotten)

For these reasons, if you build a package, next year you will still be able to run the functions you wrote yesterday. Which is often not the case for stand alone functions that are poorly documented and may depend on many other functions … that you cannot find any more. So to summarize, packages are not only a way to extend functionality, they are also the standard way to archive and save functions.

To make it easy for you to make a package out of your code, you need to stick with a project structure that packages have. Another benefit of maintaining a consistent project structure is that it will make it easier for you to switch from one project to the other and immediately understand how things work.