Learning objectives
- Learn about graphical interfaces for programming in R and Python
- Understand the structure of a package in R and Python
- Adopt some good scripting/programming habits
- Learn how to find help with solving programming problems
Graphical user interfaces (GUIs) and integrated development environments (IDEs)
It’s time for us to actually delve into R and Python programming! But how do we do that efficiently? We could write scripts in Notepad and run them, but that is not particularly efficient, as there are many graphical user interfaces (GUIs) that can help us write code faster. Comprehensive GUIs that help with writing, debugging and packaging code are called integrated development environments (IDEs).
R
There are multiple IDEs for R. The most popular one is RStudio, as it is developed by a company that is very active in contributing to R (Hadley Wickham and others). RStudio is cross-platform and open-source. It comes in two types: RStudio Desktop is a regular desktop app, and RStudio Server is meant for running on a remote server, to which you can connect through your web browser. Interestingly enough, the desktop version is in fact just RStudio Server running locally, with an integrated web browser based on Google Chrome.
Even though RStudio is very popular, it is not the only IDE for R. RKWard is an alternative IDE, aimed at easier learning of R for people coming from other statistical software, such as SPSS. It includes menus for common statistical analysis algorithms, such as getting descriptive statistics, running statistical tests, and making plots. It also includes an editor for data tables. RKWard is an open-source native desktop app that is developed on Linux, but has also been ported to other platforms, including Windows.
Furthermore, there are cross-language GUIs/IDEs, such as Jupyter, whose name is a portmanteau of Julia, Python and R. Jupyter can run different language interpreters, what it calls kernels, for each script that is open, therefore scripts in multiple languages can be edited at the same time in the same interface.
One thing to keep in mind is that there is a distinction between a
programming language, such as R, and its IDEs, such as RStudio.
Which IDE you use is up to your own personal preference. Which IDE you
used to develop code does not matter, because in the end you just have
an R script that can be run on any of the IDEs, or directly from the
command line. What does matter is that the script is written in the R
language, as the users will need to have the R interpreter installed in
order to run the script. Likewise, the packages that you use in your
script are also important, as the users will need to install them before
they can run your script. Therefore, whenever you refer to scripts and
packages, for instance when writing a thesis, you need to specify what
language the scripts are written in, and what packages (and ideally what
versions) you used. You should not mention which IDE you used, as that
is irrelevant to the readers. For example, you should not write “the
scripts are written in RStudio”, but rather, you should write “the
scripts are written in R, using package lattice
version
0.22”. In fact, you should also cite the authors of the language and the
packages you used. R includes a handy function citation()
to help you cite:
## To cite R in publications use:
##
## R Core Team (2024). _R: A Language and Environment for Statistical
## Computing_. R Foundation for Statistical Computing, Vienna, Austria.
## <https://www.R-project.org/>.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2024},
## url = {https://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.
It also works on packages:
## To cite package 'lattice' in publications use:
##
## Sarkar D (2008). _Lattice: Multivariate Data Visualization with R_.
## Springer, New York. ISBN 978-0-387-75968-5,
## <http://lmdvr.r-forge.r-project.org>.
##
## A BibTeX entry for LaTeX users is
##
## @Book{,
## title = {Lattice: Multivariate Data Visualization with R},
## author = {Deepayan Sarkar},
## year = {2008},
## publisher = {Springer},
## address = {New York},
## isbn = {978-0-387-75968-5},
## url = {http://lmdvr.r-forge.r-project.org},
## }
There are more IDEs, such as R Commander, but we will not be talking about them within the scope of this course. Nevertheless, feel free to use whichever IDE is your favourite!
As a sidenote, IDEs can change how certain commands in R work. For
instance, there are two commands in R that load packages:
library()
and require()
. Most IDEs make no
difference between the two; require()
just gives a warning
if it can’t load a package, whereas library()
stops with an
error. However, in RKWard, running require()
on a
package that does not exist will result in its package management
dialogue opening with the required packages preselected for
installation, and closing the dialogue will result in a successful
loading of the package (if it was installed successfully). Therefore, do
not use require(“package”)
to check if package
is installed, otherwise it will be installed twice under RKward.
Instead, you can use “package” %in% installed.packages()
.
Next, let’s look a bit more in detail at the IDEs described above and their configuration options.
RStudio
In the virtual machines provided in the course, RStudio is already installed for you. In case you are working on your own computer and would like to know how to install R and RStudio, see the RStudio website.
If you are new to RStudio take a look at the following summary YouTube video about how to use RStudio: Intro to RStudio (6 min). In it you will learn how to navigate the RStudio environment and even run some code. This will be helpful for later in the course.
Additionally, the first time you open RStudio it’s helpful to change some global settings. To do this go to Tools → Global Options… in the top menu bar of RStudio. The following box should appear. Set the settings to match that of the screenshot, namely, Never save the workspace to .RData file on exit and uncheck the boxes under History and Restore .RData. The effect is that RStudio will stop prompting you on exit about whether you want to save the workspace and reload it on next start, which will save you time answering prompts and save time opening RStudio. Most importantly, it will make sure that restarting RStudio leaves you with a clean environment, equal to what another user of your code would start with, so you can properly test your code, and restart RStudio in case something goes wrong to get back to a clean state.
To make a new script, click on File -> New File
-> R Script, and a new editor pane will open, allowing you
to write code. When you write a line, you can run each individual line
that your text cursor is on by hitting Ctrl+Enter. To install
packages, aside from using the R console below
(install.packages()
function), you can install them through
the graphical Packages pane at the lower right corner of the
screen, where you can search for and install packages and their
updates.
RKWard
RKWard is also preinstalled on the virtual machine. On your own computer, you can get it from the RKWard website.
When you launch RKWard, it will start a first-start setup wizard, that suggests installing some additional packages. Feel free to simply dismiss it by clicking Cancel. To start a new file, click Create -> Script File. Typically you can also use Ctrl+Enter to run each line that your cursor is on. RKWard sometimes gives too many autocompletion tooltips, you can disable them by going to Settings -> Configure Script Editor and unchecking Function call tip.
You can install packages, aside from using the
install.packages()
function in the R console below, through
Settings -> Manage R packages and plugins…. You
will be prompted to select a mirror, just use the top one
(0-Cloud) to automatically select the best one. The tab
Install / Update / Remove R packages lists all packages that
are available on CRAN, packages that are installed and are the latest
version, and packages that are installed but can be updated. To install
packages, click the checkbox next to their names, and to remove
packages, uncheck them. The changes will only be applied once you click
Apply.
Jupyter
Jupyter comes in two types as well: Jupyter Notebook and Jupyter Lab.
Jupyter Notebook is an older interface that is based on the
notebook concept, where you combine text with code in a single
(.ipynb
) file. Jupyter Notebooks are extremely
popular in the Python community, as they are easy for writing
documentation and tutorials (vignettes). They are displayed
automatically in GitLab and GitHub. However, Jupyter Notebooks
are less popular in R, because RMarkdown provides a similar
solution for R (and recently Quarto extends RMarkdown
beyond R, therefore directly competing with Jupyter Notebooks).
An issue with Jupyter Notebooks is that, due to mixing of text
and code, you cannot run a notebook from the terminal, or in fact use
any IDE other than Jupyter Notebook. They are also not well
compatible with Git, as multiple developers cannot edit the
same file without causing spurious conflicts, albeit there are
workarounds.
Jupyter Lab is a newer interface that allows directly editing R and Python files, in addition to opening Jupyter Notebooks. Its advantage is that it can edit multiple scripts in multiple languages at the same time, but its disadvantage is that it cannot provide functionality that is specific to any one language, therefore it is relatively barebones.
Like RStudio Server, Jupyter is mainly designed to be run on a server. It can be run locally by launching a server and then connecting to it through a web browser. However, since Jupyter is written in Python, it requires specific Python setup to install. We will learn more about how to run Jupyter in the Python part of the course.
The fact that Jupyter is open source and designed to work as a server has created opportunities for Google to create something of a Google Docs version of Jupyter, that is called Google Collaboratory, or Google Colab for short. Google provides every user of Google Colab with computer resources to run Python or R. It is therefore the easiest way to run Python without changing anything on your own computer. If anything goes wrong, you can always create a new Colab notebook and start over.
First, go to the Google Colab website and click on + New notebook to create a Colab notebook. The default kernel type in Google Colab is Python, but you can change it to R by going to Runtime -> Change runtime type and selecting R. Click Connect at the top right to start the kernel, so that you can run code blocks.
Note that Google Colab has a special character, the
exclamation point !
, which allows you to run Bash commands
within Python code blocks. This is very useful and important for
installing Python packages, as they are typically installed using the
pip
command from Bash. You can try running a code block
like this to install and load the rasterio package:
Python
Aside from Jupyter, you can use other IDEs for Python. A simple to install IDE is called Spyder, as it is its own Python package. You can also use more complex IDEs for Python scripting, such as PyCharm and Microsoft Visual Studio Code, but they are more complicated to install.
You will learn more about Python IDEs later in the course. For now, we will stick with using Google Colab to run Python code.
The overall system architecture
To recap, the overall system architecture is comprised of integrated development environments (IDE), engines, packages, bindings, and libraries. Let’s start with the engine, the core program that executes the foundation or crucial tasks of the programming language. The most relevant engines in this course are Python and R. To interact with the engine, we can either code in the command line directly or we can use an IDE, which provides many tools and features for working with the engine integrated in a single environment. An IDE allows the developer to write code, test it, and debug it all in a single software application.
We then install packages in the IDE, these are collections of related code files, libraries, and resources that are related and compatible with each other. Libraries are pre-written code that provide specific functions and services. The idea of libraries and packages is to make coding more efficient by the reuse of common functions.
What are packages?
Packages extend the functionality of a programming language by providing new functions (and/or datasets). There is a huge variety of packages in both R and Python. But how are packages made?
In essence, packages are nothing more than bundles of scripts! Typically, these scripts provide functions: small pieces of scripts that do one particular task, typically with a name and input arguments. By loading a package, we simply place these functions in what is called the global environment: a part of your computer memory that holds your variables, function definitions, datasets etc. for the currently running session of R or Python.
Packages are formalised by a set of rules that all developers have to abide by. One important rule is that the action of merely loading a package should not cause harm to the user (or their environment): functions can be loaded into the global environment, but should not overwrite what is already there, or run any code (which could potentially delete files etc.). Therefore, packages typically consist of script files with nothing other than function definitions. Running those functions is left up to the users in their own scripts. How to run them is documented using examples, either in comments next to the functions, or in an external help file, but never directly as executable code.
In addition to the scripts (and their documentation), packages typically include some metadata, such as the name of the author, package version, what other packages are needed to run this package, etc. In R, this metadata is mandatory, whereas in Python it’s to some extent optional.
Packages also typically follow a particular structure, i.e. where files are located and how. Let’s take a closer look at the package structure in both Python and R.
Package structure
In Python
A simple package (module) in Python is simply a directory with a Python file in it. In Colab, this code will create a directory and a Python file containing a Hello World example:
! mkdir -p MyPackage
! echo "def helloworld():" > MyPackage/helloworld.py
! echo " print('hello world')" >> MyPackage/helloworld.py
Now we can directly make use of our new helloworld()
function in Python:
In the import name, MyPackage
is the name of the
directory, and helloworld
is the name of the file. We
rename MyPackage.helloworld
to mphw
for
brevity, but it’s optional.
Well, that was easy!
Though our package is technically not a complete package yet, as it can only be run if we copy the file to every code project that wants to use our function. A true package can be installed and then run from any place on your computer. Thankfully that’s also easy to do! All we need is to provide a bit more structure and metadata.
The metadata file in Python is called pyproject.toml
,
and it has to be in a subdirectory with the name of the package, as the
name will appear in the list of packages. The actual functions need to
be in a subdirectory of that subdirectory, with the name of the package,
as the package will appear in Python. Typically, both of these names
should be the same, otherwise it will result in user confusion. Lastly,
if we want to be able to just import the package to get our functions,
without having to specify the name of the file that holds the functions
we want to import, we should call the script with our functions
__init__.py
.
In summary, the package structure looks like this:
MyPackage
├── MyPackage
│ └── __init__.py
└── pyproject.toml
Let’s make our files match this structure! First, make the
subdirectory MyPackage/MyPackage
, and then rename our
helloworld.py
to
MyPackage/MyPackage/__init__.py
:
Next, let’s add the project metadata. We only need to specify the name of the package and its version. To prevent confusion, use the same name as the name of the two directories.
! echo '[project]' > MyPackage/pyproject.toml
! echo 'name = "MyPackage"' >> MyPackage/pyproject.toml
! echo 'version = "0.1"' >> MyPackage/pyproject.toml
Bam, we now have a true package! We can install it with
pip
:
And now we can use our function from any directory simply as:
In R
Packages in R are also nothing more than an organised collection of R scripts. Compared to Python, there are a few more files necessary to make an R package, but thankfully, we also have tools that help to create one!
Once again, let’s make a Hello World function and put it into a file:
Let’s make it into a package! We use the function
package.skeleton()
to autocreate a package structure:
## Creating directories ...
## Creating DESCRIPTION ...
## Creating NAMESPACE ...
## Creating Read-and-delete-me ...
## Copying code files ...
## Making help files ...
## Done.
## Further steps are described in './MyPackage/Read-and-delete-me'.
Let’s follow the instructions to read the file
MyPackage/Read-and-delete-me
. And then delete it:
Let’s see what is the R package structure:
## MyPackage
## ├── DESCRIPTION
## ├── man
## │ ├── HelloWorld.Rd
## │ └── MyPackage-package.Rd
## ├── NAMESPACE
## └── R
## └── HelloWorld.r
##
## 3 directories, 5 files
It’s a little bit more complicated than Python, but not by much. The
script files are stored in a directory called R
(as R
packages can also include C++ code etc.), and our script file is already
there. Next we have the DESCRIPTION
file, which, like the
pyproject.toml
file we saw for Python, includes metadata
about the package. You can open it and edit it. Lastly, we have the
NAMESPACE
file, which states which functions will be
accessible to package users (as opposed to only internal to the
package), and a man
directory, which includes the
documentation manual entries. We have two entries:
HelloWorld.Rd
, which is the description of our
HelloWorld
function, and MyPackage-package.Rd
,
which is a description of the package itself. These documentation files
are written in an R
variant of LaTeX.
Let’s install our package! First we need to build it, from the terminal:
## * checking for file ‘MyPackage/DESCRIPTION’ ... OK
## * preparing ‘MyPackage’:
## * checking DESCRIPTION meta-information ... OK
## * installing the package to process help pages
## * saving partial Rd database
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## * building ‘MyPackage_1.0.tar.gz’
As you can see, the package is just a zip of our package files, plus some metadata cache. Now we can use R itself to install our new package:
## Installing package into '/home/osboxes/R/x86_64-pc-linux-gnu-library/4.4'
## (as 'lib' is unspecified)
## inferring 'repos = NULL' from 'pkgs'
Perfect, we can immediately load it and use it:
## [1] "Hello world"
As a bonus, we can also see the documentation we had in the
man
files:
That was also pretty easy!
To summarise, we have learned that packages in both Python and R are nothing more than scripts that contain functions, in a particular directory structure. If we start developing our projects with that structure in mind, we can make packages out of our scripts very easily!
Tip: Once you have a package, you can also have it uploaded to a package repository, so that others (or you from another computer) can easily download and run the code in the package. However, it does require you to have your code in good working order and be well documented, since, after all, it’s going to be public for everyone to see!
The package repository for Python is called PyPI, The Python Package Index. PyPI is developer-friendly, which means that you can easily upload your packages on PyPI without much hassle, as long as your project is in a reasonable shape and declares all its dependencies properly. See the PyPI documentation to learn more about how to submit packages to PyPI.
The package repository for R is called CRAN, the Comprehensive
R Archive Network. Unlike PyPI, CRAN is end-user-friendly. That means
that submitting packages to CRAN is not nearly as easy, because it has
much higher quality requirements for any (new) packages. All packages
submitted to CRAN have to pass through both automatic tests (using
R CMD check
command), as well as human review. Your package
will be rejected if it depends on a package version that is not
currently on CRAN, if you fail to document any function parameter, make
any typo anywhere or even fail to use quotes correctly. Therefore, the
package submission process for CRAN is more akin to peer review for
publishing a scientific paper. However, it ensures that packages always
work together with other packages, which immensely simplifies package
dependency management and allows users to simply use
install.packages()
and expect the new package to just work.
Packages are also regularly removed from CRAN if they stop working due
to updates in their dependencies, making sure that all packages on CRAN
stay compatible with each other. See CRAN
documentation for more information about package submission
requirements.
Good programming habits
Project structure
Making packages is a great way to stay organized, keep track of what you are doing and be able to use it quickly and properly at any time. Packages are:
- Easy to share with others
- Dependencies are automatically imported and functions are sourced (reduces the risk of having broken dependencies)
- Documentation is attached to the functions and cannot be lost (or forgotten)
For these reasons, if you build a package, next year you will still be able to run the functions you wrote yesterday. Which is often not the case for stand alone functions that are poorly documented and may depend on many other functions … that you cannot find any more. So to summarize, packages are not only a way to extend functionality, they are also the standard way to archive and save functions.
To make it easy for you to make a package out of your code, you need to stick with a project structure that packages have. Another benefit of maintaining a consistent project structure is that it will make it easier for you to switch from one project to the other and immediately understand how things work.
To practice keeping a consistent project structure, in this course we will be following the structure below:
- A
main
script at the root of the project. This script performs step by step the different operations of your project. It is the only non-generic part of your project (it contains executable code outside functions, including paths, already set variables, etc.). The file extension of this file will depend on what language you are using for your project. We typically call these filesmain.r
andmain.py
, but it could also be e.g.task1.r
. - As we will be working with multiple languages throughout this course
we will keep things organized by placing the scripts into their
respective language sub-directories (
R/
,Python/
, andBash/
). These directories should contain the functions you have defined as part of your project. These functions should be as generic as possible and are sourced and called by themain
script. The way this is done depends on the language used by themain
script. For example, in R you would writesource("R/myfunction.R")
. Whereas in Python, you would useimport Python.myfunction
.- Each file in the
R
andPython
directory should ideally consist of a single function with the same name as the file itself, to make it easy to find. In Python it is common to combine multiple functions in one file, because typically you refer to the file (module) name in addition to the function name (throughimport MyPackage
), but it is still good practice to keep each function in its own file (so you don’t get confused where each function comes from even if you dofrom MyPackage import *
).
- Each file in the
- A
data/
subdirectory: This directory contains data sets of the project. Since Git is not as efficient with non-text files, and GitLab has storage limits, you should only put small data sets in that directory (<2-3 MB). Typically, you do not include it in your git repository at all; rather, this directory is created from yourmain
script, and is used to store data downloaded from the internet. It can be safely removed after the script is finished running. - An
output/
subdirectory (when applicable), where you place the final result of running your script. This should also not be tracked by git: your scripts create the output, so there is no need to store it. - A
README.md
file should be included, this file should contain a description of your project, along with a description of what other packages your script needs to function correctly, and any other instructions needed to correctly run your code. - Finally, you should include a
LICENSE.txt
file with the software licence which you would like your code to have.
Warning: Never upload large (raster) files to git! If your repository exceeds 100 MiB in size, it will no longer be loaded by CodeGrade. Deleting the files will have no effect, because Git stores the history forever. Fixing a repository like that is extremely complicated and time-consuming!
Note: Without a LICENSE
file, your code is
copyright, which means that nobody has the right to even read it, let
alone run it! Make sure to always specify a license that would at least
allow that.
Example main
file
Typically the header of your main script will look like the following:
In R (main.R
)
# Team Teamname (John Doe and Jane Smith)
# January 2020
# Import packages
library(terra)
library(sf)
# Source functions
source('R/download_data.R')
source('R/function2.R')
# Use sourced function to download data to the directory "data"
download_data("data")
# Load datasets
postboxes <- st_read('data/postbox_locations.gpkg')
# Then the actual commands
In Python (main.py
)
# Team Teamname (John Doe and Jane Smith)
# January 2020
# Import packages
import geopandas as gpd
import matplotlib.pyplot at plt
# Import functions
import Python.download as download
# Use imported function to download data to the directory "data"
download.download_data("data")
# Load datasets
postboxes = gpd.read_file('data/postbox_locations.gpkg')
# Then the actual commands
Working directory, relative and absolute file paths
At the end of the following section you should be able to explain the difference between the following:
- relative path
- absolute path
- working directory,
- And the following special directories:
.
..
/
or the root directory
In the R and Python examples above we load the datasets by indicating
the file location from the data directory
"data/postbox_locations.gpkg"
. However, you may have many
data folders on your computer, for all types of different projects. So
how does the system know to look in the correct one? Moreover, if you
share your script with a friend the location of their project and data
folders will be different than that of your setup. It would be a
nuisance if they had to change all references to these files in their
script. To deal with these issues, we use relative file
paths.
In relative file paths, we don’t include the
location of the project (working) directory itself,
these paths are relative to the working directory (the
“Project_Structure” folder). In the example above, the
relative file path for the post box locations file
would be data/postbox_locations.gpkg
, whereas the
absolute file path would be
"/home/osboxes/Geoscripting/Project_Structure/data/postbox_locations.gpkg"
.
The absolute file path refers to a file from the
root of the entire file system. On Linux (and other
UNIX-like systems like macOS), absolute file paths
always start with /
.
Note: on Windows, you might see a backslash
(\
) being used as a path separator instead of a slash
(/
). Don’t do this! In many languages,
including R, a backslash denotes an escape
sequence. In addition, a backslash is not a valid path separator on
non-Windows platforms, whereas both a slash and a backslash are valid on
Windows. So save yourself the trouble and always use a slash as
a path separator!
So what is the working directory? By convention, the working directory is the same as the location of the script which you are working on. This means that you can simply assume that whoever runs your script, will run it from the directory that your script is located.
Note: this also means that when you test others’ code, you should also make sure to run it from the directory that the script is located, unless stated otherwise!
To refer to directories or files that are within the working
directory, we simply use their names. So to refer to a directory called
R
in our working directory, we type R
. To
refer to a file or directory within another directory, we type the name
of the directory, a slash, and then the name of the file/directory, for
instance, R/function1.R
.
If we want to refer to a file or directory that is above the
indicated directory, we use the special directory ..
. For
instance, if our main.R
is not in our project root, but
located in the sub-directory demo
(therefore our working
directory is demo
), we would refer to our
function1.R
file as ../R/function1.R
. Another
special directory is .
, which refers to the indicated
directory itself.
When making Git repository, you want to make sure that all your code is portable and self-contained, i.e. you can run it from any computer, and ideally using any operating system. That means that as a rule of thumb you should always use relative file paths in your scripts.
Question 1: what would be the location of this file:
././R/./.././R/./././function2.R
? How about/./R/./.././R/./././function2.R
? What would be the meaning ofC:/Windows/cmd.exe
on Linux? Is it a relative or absolute file path?
Documentation
As we have seen in the package examples, documentation of code is
extremely important to understand what your code is doing. It’s not only
for others reading your code, but also for yourself, when you have to
revisit code you haven’t touched in a while. If you follow the good
scripting habits mentioned above, you will already have your code
divided into reusable functions, that are in their own function files.
Each function should have enough comments to understand each step the
function is performing. Comments are any words after the #
character, in either language. Same goes for the main script, it should
also have each step described. But you don’t need to make it redundant:
if it’s already clear from the function name what it does, it’s useless
to repeat it in a comment. Rather, you would describe sections of code
at once, e.g. what a loop does by itself, rather than its indvidual
steps. In addition, if you use code from someone else, make sure to obey
the license of the code you are using, and give credit to the original
author using comments.
In addition to comments of your code, each function should also have a description and an explanation for all of its arguments. In both R and Python, there is a special way to add this information in comments next to the function definition, so that help files can be generated out of them automatically. The systems have different conventions, but effectively achieve the same purpose.
In Python, the system is called Docstrings. It consists of strings using triple quotes, that effectively acts as a multi-line comment. Here’s an example of a Python function documented with a Dosctring using the official reStructuredText (Sphinx) format:
def hello(name):
"""Greet a person.
Give a customised "Hello" greeting for the input name.
:param name: A string containing the name of the person to greet.
:return: A greeting string.
"""
return("Hello " + name)
The Docstring will be automatically printed when you run
help(hello)
, and will be converted into a nice webpage
using tools such as Sphinx.
In R, the system is called roxygen2 and is
provided by the roxygen2
package. Here’s an equivalent
example in R, using roxygen2:
#' Greet a person
#'
#' Give a customised "Hello" greeting for the input name.
#'
#' @param name A string containing the name of the person to greet.
#' @returns A greeting string.
hello <- function(name) {
return(paste("Hello", name))
}
The roxygen2 package was created to make it easier to write
documentation in R, so that documentation does not have to be separate
from the code. The roxygenize()
function generates the
documentation files from the comments automatically, and then the
documentation appears when running ?hello
in our
example.
Summary
Increasing your scripting/programming efficiency goes through adopting good scripting habits. Following the guidelines above will ensure that your work:
- Can be understood and used by others.
- Can be understood and reused by you in the future.
- Can be debugged with minimal effort.
- Can be re-used across different projects.
- Is easily accessible by others.
The summary of good practice guidelines below is not exhaustive, but already constitutes a good basis that will help you getting more efficient now and in the future when working on scripting projects:
- Comment your code.
- Write functions for code you need more than once:
- Document your functions.
- Keep consistent style.
- Make your own packages, or at least keep a similar directory structure across your projects.
- Use version control to develop/maintain your projects and packages.