Course setup
Important: The Geoscripting course is a Master-level course given in Wageningen University. This set of documents that you are reading provide the theoretical material from the course for use both in the course itself, as well as for people who are following (parts of) the course externally or are in general interested in the topics that we cover. As such, these documents aim to be generic for all of the user groups above.
If you are a student following the course at Wageningen University (WUR), please read the information in the course guide in Teams and on Brightspace. All course-specific information and exercises can be found there. Information in the course guide overrules any information written in these pages, so please read it carefully and check it often. You will also find all the information on deliverables and exercises there.
Linux & version control
Introduction
Welcome to the Geoscripting course! Today we will get familiar with Linux, which is an advanced environment optimised for scripting, and with version control software that helps you collaborate with one another and keep track of your file versions. These tools are very important, as we will use them throughout the course for all course activities, and they will continue to be very useful after the end of the course for all your scripting work. Additionally you will learn about project structure, and familiarize yourself with RStudio.
Throughout the whole course, we will be working in a Linux environment, and all of the material has only been tested on (and assumes) a Linux environment. Every WUR student will get access to a Linux virtual machine.
Learning objectives
At the end of the tutorial, you should be able to:
- Know what Linux is & what you can do with it
- Get comfortable working within a Linux environment
- Explain why software licenses are important and what software license options there are
- Apply a software license to your own code
- Use version control to develop, maintain, and share your code with others
- Set up a project structure
- Get familiar with (relative) paths
- Submit an exercise using Git and GitLab
Linux
Linux is a free and open-source operating system kernel. The kernel interacts with computer hardware and exposes its capabilities for your scripts! Together with a lot of small, handy programs, it forms an operating system called GNU/Linux. However, unlike e.g. Windows, there is not a single “GNU/Linux operating system”. Rather, there is a huge variety of Linux distributions. Each Linux distribution provides the same kernel, but different programs and environments, suitable for different use cases.
For example, one distribution that is very handy for geo-information science work is OSGeo-Live, which is an Ubuntu-based Linux distribution that has a wide range of free and open-source GIS and Remote sensing tools preinstalled. See this website for more information.
These tools are also available in other distributions, but they have to be installed manually. A general-use distribution such as Ubuntu itself, openSUSE or Fedora is more suitable for regular day-to-day tasks, since not having the unnecessary tools installed takes less space and makes it work faster. It is also easier to find help for them than for specialised distributions.
For the Geoscripting course, we have developed what is effectively our very own Linux distribution, with the use case of providing all of the tools necessary to finish the course. These tools are also very useful after the end of the course to continue data processing, for example for writing Master theses. Within our laboratory, we have several computers that are running this Geoscripting distribution, so that transferring over the work from one computer running it to another one would be as easy as possible, so you can continue working uninterrupted even after the end of the course. The Geoscripting distribution is nothing more than a set of scripts that install the necessary tools on top of what plain Ubuntu provides.
Why use a Linux distribution?
A Linux environment makes it much easier to install and combine a variety of open-source software, such as Python modules and GDAL, compared to other operating systems like Windows or macOS. In addition, open-source scientific software is often developed primarily for Linux (since that’s what most supercomputers and servers run!), and so it tends to be more stable and have more features on Linux. Lastly, Linux has a set of standards that allow programs to interoperate with each other, so that e.g. you can access GRASS GIS from R, QGIS from Python, GRASS GIS from QGIS, Python from R etc. All of this is managed and checked for quality so that you can always use the latest and greatest software without worrying about version mismatches and compatibility between software tools.
For the course, it also makes it possible to use the wide variety of tools that we will work with, all from a single supported environment. That way, we can be sure that the tools work the same way for all of the students, and that we also test the exercise submissions using the same versions of the tools to get the same output.
Getting started on Linux
During the course we will work in a Linux environment. See the Linux system setup page on how to install and run the Linux virtual machine on your own computer. The page also explains how to run Linux from a USB stick in case you don’t have enough space for a virtual machine.
Notice: Make sure you read the page linked above and have no problems logging into and using the VM. From here on out, we will try to work from within the VM exclusively.
In case you can’t get the VM running successfully (and only in that case, so hopefully you don’t need to do this!), there is an alternative: we have the possibility of providing access to a SURF Research Cloud VM setup. See this page for instructions on gaining access to the SURF Research Cloud VM.
If you are a power user and want to install Linux on your own laptop directly to have it run at full performance, see also a theoretical overview of running Linux on your own hardware.
The VMs are strongly recommended. If you go for installing Linux yourself, the systems need to be set up manually and we do not have the time and manpower to support every student with this.
Once you have everything ready, login into your Linux VM, try out RStudio/RKWard, and also open QGIS. Explore the environment a little to get used to it.
Software licenses
One key advantage of Linux is that it is free and open-source software. While it is free as in free beer, that is, it can be used at no cost, more importantly it is free as in free speech: all of the source code of the kernel and the absolute majority of the applications is licensed under a free software license.
A software license is a legal text that describes how the software and its source code can be used by other people. Software licenses are grounded in the framework of copyright: the protection of authors’ intellectual rights. A free software license is a software license that gives others the freedom to run, copy, read, modify, and distribute changes to the original software and its source code. This is in addition to an open-source license, which makes the source code available and redistributable, but does not necessarily make the source code free. Both free an open-source licenses have their overseeing bodies: the Free Software Foundation for free software licenses, and the Open Source Initiative for open-source licenses. When a software fits both definitions (they often overlap), it is referred to as Free and Open-Source Software (FOSS), or les often as Free, Libre and Open-Source Software (FLOSS).
There are many advantages to FOSS. One advantage is that it fosters collaboration: one person implementing a feature makes it available for all of the users in the world. This enables such a massive effort required to create GNU/Linux distributions based on volunteer work, without needing to rely on commercial licensing, advertisements, donations or spyware to finance the work. It also allows anyone to remove such undesired parts of any software component, therefore ensuring higher quality of the software. Thus, while FOSS projects initially start weaker than proprietary (non-free or closed-source) software, in the long run the collaboration potential brings it on par and even overtaking the propriatary counterparts. See for example QGIS, which is FOSS, vs the proprietary ArcGIS.
A software license defines what others can do with your code, therefore before starting to write any code, it is vital to think about the license you would like to release your code under. This is because if you do not define any license, the default copyright terms apply: even if you publish the source code publicly, nobody is allowed to copy, redistribute or modify the code, in fact nobody is even allowed to read it! As an author, you are free to choose any license, both proprietary and FOSS licenses (or in fact no license altogether), but a proprietary license restricts the freedoms of others and therefore diminishes chances that others would want to collaborate with you to improve the code in the future. In addition, do not confuse a software license with commercial licensing, i.e. the requirement to activate a license subscription to use
There are two types of FOSS licenses: copyleft and permissive. A permissive license is one that allows copying, modifying and redistributing the code with no serious restrictions (usually with a restriction that the original author be credited for the work). A copyleft license adds a restriction that any modified versions that are distributed must be under the same (or equivalent) license. This restriction restricts others from restricting the terms of the software license in the future, therefore keeping the source code free forever. In other words, it’s following the philosophy that if we want to achieve the most freedom, we must restrict the freedom to restrict freedom!
Lastly, there is also an option to dedicate software to the public domain, which is not a license per se, but a waiver of copyright. Software in the public domain allows anyone to do anything with it without any restrictions, therefore it is radically permissive. There is no requirement to credit the original author, for example. Since some jurisdictions do not allow authors to waive copyright (including Germany, France and Italy), there are licenses such as CC0 that are aimed to make a work as free as possible by either dedicating it to the public domain, or if it is not possible, by giving it a permissive license.
How can you choose a software license in practice? There are multiple websites that give an overview of the most popular licenses that you can choose. Once you choose one, you need to follow the terms of the license about how to apply it. In most cases, it is sufficient to copy the terms of the license next to your source code and include it in your version control repository.
Question 1: If you wanted to contribute to a project that is licensed under the GNU General Public License v3 (copyleft), under which license(s) could you contribute? Which license would you choose in the end?
Version control
Have you ever worked on a project and ended up having so many versions of your work that you didn’t know which one was the latest, and what were the differences between the versions? Does the image below look familiar to you? Then you need to use version control (also called revision control). You will quickly understand that although it is designed primarily for big software development projects, being able to work with version control can be very helpful for scientists as well.