How to Convert a Project to REUSE Compatible License Statements?

This blog post provides a step-by-step example about how the conversion of a project to REUSE compatible license statements is done in practice. For my setup, I have a readily configured kdesrc-build environment.

First, I get out the most recent source code if the project I want to convert. For this tutorial, I use KTurtle, which is a nice and small application from KDE Education with just about 200 files.

Then I obtain the latest version of licensedigger and compile it:

kdesrc-build licensedigger

First I do a dry run to get an impression about how well the licenses are detected in KTurtle. But, well, it looks really bad:

$ /opt/kde/build/playground/sdk/licensedigger/licensedigger --dry kturtle/
Digging recursively all files in directory: "kturtle/"
= LICENSE DETECTION OVERVIEW =
"kturtle/CMakeLists.txt" --> "UNKNOWN-LICENSE"
"kturtle/doc/CMakeLists.txt" --> "UNKNOWN-LICENSE"
"kturtle/icons/CMakeLists.txt" --> "UNKNOWN-LICENSE"
"kturtle/org.kde.kturtle.appdata.xml" --> "UNKNOWN-LICENSE"
"kturtle/spec/assert_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/boolean_operator_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/empty_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/expression_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/for_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/if_else_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/kill_kturtle.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/learn_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/math_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/number_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/repeat_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/scope_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/spec_helper.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/start_kturtle.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/string_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/spec/variable_assignment_spec.rb" --> "UNKNOWN-LICENSE"
"kturtle/src/CMakeLists.txt" --> "UNKNOWN-LICENSE"
"kturtle/src/Messages.sh" --> "UNKNOWN-LICENSE"
"kturtle/src/canvas.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/canvas.h" --> "UNKNOWN-LICENSE"
"kturtle/src/colorpicker.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/colorpicker.h" --> "UNKNOWN-LICENSE"
"kturtle/src/console.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/console.h" --> "UNKNOWN-LICENSE"
"kturtle/src/directiondialog.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/directiondialog.h" --> "UNKNOWN-LICENSE"
"kturtle/src/editor.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/editor.h" --> "UNKNOWN-LICENSE"
"kturtle/src/errordialog.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/errordialog.h" --> "UNKNOWN-LICENSE"
"kturtle/src/highlighter.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/highlighter.h" --> "UNKNOWN-LICENSE"
"kturtle/src/inspector.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/inspector.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/dbus_adaptor_generator.sh" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/definitions.rb" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/echoer.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/echoer.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/errormsg.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/errormsg.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/executer.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/executer.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/generate.rb" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/interpreter.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/interpreter.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/org.kde.kturtle.Interpreter.xml" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/parser.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/parser.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/token.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/token.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/tokenizer.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/tokenizer.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/translator.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/translator.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/treenode.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/treenode.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/value.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreter/value.h" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreteradaptor.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/interpreteradaptor.h" --> "UNKNOWN-LICENSE"
"kturtle/src/main.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/mainwindow.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/mainwindow.h" --> "UNKNOWN-LICENSE"
"kturtle/src/sprite.cpp" --> "UNKNOWN-LICENSE"
"kturtle/src/sprite.h" --> "UNKNOWN-LICENSE"
Undetected files: 69 (total: 69)

What we get from this output is that apparently no license header is detected. This suspiciously looks like that we find a new kind of license header texts, for which licensedigger was not trained yet. Thus, I arbitrarily open one of the failing files, let’s say “kturtle/src/sprite.h”, and have a look at the header. The stated license header itself looks quite sane and clearly translates to the SPDX identifier “GPL-2.0-or-later:

Copyright (C) 2003-2008 Cies Breijs

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free
Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
Boston, MA 02110-1301, USA.

Thus, I first try adding the missing license header text to the licensedigger database (see the licensedigger README.md file for more explanation) or the commit with which I did this, which is this commit. But well, and this is surprising, when testing the license header detection by running the unit tests with “make test”, the header still is not detected… This is surprising. It turns out that tabulators for indenting license header texts are not supported yet. This is easy to fix, which is done in a follow-up commit. But if you ever land in a similar problem, just create an issue and point me to the source file that makes headaches! (Such problems are really a rare exception though!)

Now I run licensedigger again and check if there are files remaining which not correctly detected license headers, which is the case for me and I repeat the above steps. Finally, I get to the point where all stated licenses (and we cannot convert unstated ones; recovering missing license statements is a complete different topic) are correctly detected.

Finally, I run license digger in its conversion mode. Note that you can also run licensedigger many times on an already converted codebase and nothing bad happens; I simply prefer to distinguish between dry and conversion runs.

When that is done, the last remaining steps are:

  1. Review the changes licensedigger did to your source files (I like to use the command line tool “tig” for this, but there are many options).
    1. I commit the changes to the source files to have a base-line for possible manual edits.
    2. For KTurtle I actually manually formatted the license headers after the conversion to remove the tabulators. Unfortunately, there is not tooling yet for auto-format license headers (patches welcome 😉 )
  2. I add the new LICENSES/ directory to the Git repository, which contains the canonical license texts according to the REUSE specification. Moreover, I remove the now obsolete COPYING file, because that license text now is in the LICENSES directory.
  3. Finally, I can create a merge-request on invent.kde.org, because it is always good to let somebody else check your work.

And as soon as it is approved and merged, another repository enters the shiny new world of REUSE compatible license statements. For more background about REUSE compatible license statements and the route we are following in KDE, you might want to have a look at the licensing howto wiki page.

SPDX for KF5/KF6 Status Update

Converting source files from traditional license headers to SPDX expressions maybe is explained best to be like visiting a dentist: Usually it is not the most appealing thing in the world, while being there it can be slightly unpleasant and tedious for both, but at the end you are quite happy that the work was done. This is quite similar to my experience with the KDE Framework sources. Since many of the files are older than 10 years and some even older then 20, you can find surprisingly different copyright statement styles. However, finally after quite some moths task T11550 is done \o/

This small task tracks all the work that was done in the ~80 frameworks repositories, which finalle state all copyright and license statements in machine readable, modern SPDX syntax. In total, my “grep -nr “SPDX-License-Identifier” |wc” command (not completely accurate, but easiest to get an general direction) tells about ~7400 files that were converted. At this point, I want to thank especially Christophe Giboudeaux, who did most of the reviews of these changes. Even if we could do most of the conversions with tooling (see licensedigger, which is now in SDK Playground by the way) the whole conversion was quite time consuming because every change must be reviewed carefully.

So, are we done? — Not yet! — Actually, this is just the first step, which now enables cool new things. One of them, which I am currently working on, is to get some unit testing functionality in place that will allow to check that the outbound license of a library/application actually is legally compatible to the license statements… Something like this is only now possible, because the license information are machine readable.

Moreover, the conversion tasks are only done for frameworks, but there is massively more code in KDE. It is great to see that Christophe already converted (Correction: many) all of the monthly released PIM repositories and even the quite big KWin repository was converted recently. If anybody hits the limits of licensedigger and needs new feature, please drop a feature request at invent.kde.org and I will try to make it possible.

REUSE Machine Readable License Information

Some weeks ago I wrote about SPDX identifiers and how they can be used to annotate source code files with machine readable license information. In this post, now I want to compile the things I learned after looking more deeply into this topic and how it might be applied to KDE.

SPDX identifiers are an important step in order to allow tools an automatic reading and checking of license information. However, as most norms, the SPDX specification is quite general, for many people cumbersome to read and allows many options on how to use the identifiers; while me as a developer, I just want to have a small howto that explains how I have to state my license inormation. Another point is that in my experience any source code annotation with machine readable information is pointless unless you have a tool that automatically checks the correctness. Otherwise, there is a big burden on code reviews that would have to check tiny syntactical requirements from a specification. — If you look deeply into the used license headers in KDE (I did this), there is a shocking number of different license statements that often state exactly the same. This might be due to different formatting or typos but also due to actual faults when trying to pick the correct license for a use case, which somehow got mixed up.

REUSE.software

While doing research on best practices for applying machine readable license information, I was pointed to the REUSE.software initiative, which was started by the Free Software Foundation Europe (FSFE) to provide a set of recommendations to make licensing easier. What they provide is (in my opinion) a really good policy for how to uniformly state SPDX based license headers in source files, how to put license texts into the repository and a way to automatically check the syntactical correctness of the license statements with a small conformance testing tool.

I really like the simplicity of their approach, where I mean simplicity in the amount of documentation you have to read to understand how to do it correctly.

Meanwhile in KF5…

As already said, I want to see machine readable license information in KDE Frameworks in order to increase their quality and to make them easier to use outside of KDE. The first step to be done before introducing any system for machine readable identifiers is to understand what we have inside our repositories right now.

Disclaimer: I know that there are many license parsing tools out there in the wild and I know that several of them are even well established.

Yet, I looked into what we have inside our KF5 repositories and what has to be detected: Most of our licenses are GPL*, LGPL*, BSD-*-Clause, MIT or a GPL/LGPL variant with the “any later version accepted by the membership of KDE e.V. (or its successor approved by the membership of KDE e.V.), which shall act as a proxy […] of the license.” addition. After a lot of reasoning, I came to the result that for the specific use case of detecting the license headers inside KDE project (even focused only on frameworks right now) it makes most sense to have a small tool only working for this use case. The biggest argument for me was that we need a way to deal with the many historic license statements from up to 20 years ago.

Thus, I started a small tool in a scratch repository, named it licensedigger and started the adventure to parse all license headers of KDE Frameworks. From all source files right now I am done with about 90%. Yet, I neglected to look into KHTML, KJS, KDE4LibsSupport and the other porting aid frameworks until now. Specifically for the Tier 1 and Tier 2 frameworks I am mostly done and even beasts like KIO can be detected correctly right now. Thus, I am still working on it to increase the number of headers to be detected.

The approach I took is the following:

  • For every combination of licenses there is one regular expression (which only has the task to remove whitespace and any “*” characters).
  • For every license check there is a unit test consisting of a plaintext license header and a original source code file that guarantees that the header is found.
  • Licenses are internally handled with SPDX markers.
  • For a new license or a license header statement for an already contained license, the license name must be stated multiple times to ensure that copy-past errors with licenses are minimized.
  • It is OK if the tool only detects ~95% of the license headers, marks unknown headers clearly and requires that the remaining 2-3 files per repository have to be identified by hand.

At the moment, the tool can be run to provide a list of license for any folder structure, e.g. pointing it to “/opt/kde/src/frameworks/attica” or even on “/opt/kde/src/frameworks” will produce a long list of file names with their licenses. A next (yet simple) step will be to introduce a substitution mode that replaces the found headers with SPDX markers and further to add the license files in a way that is compatible with REUSE.

Please note that there was no discussion yet on the KDE mailing list if this (i.e. the REUSE way) is the right way to go. But I will send respective mails soon. This post is mostly meant to provide a little bit of background before starting a discussion, such that I can keep mails shorter.

FOSDEM 2017 & the QtWayland Compositor Framework

This will be a rather short blog post but since I completely missed to making it before this year’s FOSDEM, just let me give you a short hint to my current talk: This year, for the first time, I submitted a talk to the Embedded & Automotive DevRoom. If you think that this sounds crazy, actually, what we see on modern embedded devices, like in cars or in even bigger machines, this tends gain a similar complexity like the good old Linux desktop environments. In terms of multiple processes, window compositing and UI requirements, a lot of such demands are already on the table…

Recently, I looked into the QtWayland Compositor framework, which is an awesome new tool if you want to create a small but use case specific Wayland compositor, as it is often the case in the embedded world. The framework was just released as stable API with Qt 5.8. If you want to read more, I just gave a talk about it yesterday:

Have fun 🙂

State of the KF5 Android CI

banner-fundraising2016

I would have liked to say, “Yeah the Android CI runs!” – But we are not there yet; pretty close actually, and close enough that it already makes sense to tell about it, yet a few last Jenkins settings remain to be done and real life issues cause this to take a few more days. So, I will give a short primer on what we prepared in Randa.

After tonight’s run of the KApiDox generation at https://api.kde.org, all KF5 libraries whill correctly show if they support Android or not. The current count of supported frameworks is 16 and since last year some important ones like KI18n, KCoreAddons and Threadweaver were added to the list. For all of them I can say that the build is tested in my Android cross-building Docker image together with Qt 5.6 and the Android CMake toolchain from Extra-CMake-Modules.

Obviously, the currently most pressing issue is to get a proper CI system for KF5 on Android into place. The very  same Docker image, as names above, was extended to be utilized as container for our future Docker based CI system. During the Randa week the container was already integrated into build-sandbox.kde.org and just waits there for the last infrastructure bits to be correctly set. The next step following the build coverage will be the integration of automatic unit tests. For e.g. Linux systems this is a boring topic, since one can just run tools like CTest. On Android the story is slightly more complicated. Our goal is to run unit tests in a closest possible real life scenario. That means, we want unit test to be packaged as APK files, then installed and run in an Android emulator, and also need to get the results back. This means, we need some wrapper code around the generated CTest files that does all the packaging, installing, result downloads, etc. The main parts of these scripts were finished during Randa and as a proof of concept, for the first tested framework Attica’s unit tests already pass on my devel machine \o/ Once we have the build coverage in place, getting the unit tests checked on the CI will be the next step.

I hope that all of these existing bits will come together soon. However, with QtCon approaching in already two months and me giving a talk about KF5 on Android, it sounds there is a clear deadline for this integration to conclude 🙂

PS: the fundraiser campaign for Randa is still ongoing for 12 more days!

Progress of KDE on Android Infrastructure

We have 2015 and Android is a very important platform for (mobile) applications and developers. — This somehow could also have been written a year ago, and actually it was stated then by several people. Those people also started porting some first applications from the KDE/Linux world to the Android platform. Yet, when looking at what happened the last year, as of now, we only have KAlgebra, GCompris and (since recent) Marble Maps that are available on Android.

The interesting question is, what holds back the many KDE applications that would also fit on an Android platform? During this year’s Randa sprint we took the opportunity and sat together to exchange what we learned during the last year. Looking at the different approaches of porting applications to Android,  we learned that already setting up a build system is a by far non-trivial job and probably one of the main points that holds people back from playing around with Android. Still, also the availability of KDE Framework libraries was not really tested in details yet, and without having availability guarantees it raises an uncertainty about how easy a port to Android might be.

To overcome these problems, we start with some simple approaches:

  1. Provide a simple and easy to use build environment.
    From the several existing toolchains for building Android applications, we started to reduce the number of different ones within the KDE projects to one. The new general toolchain (provided since some time via extra-cmake-modules) gained a new tutorial for how to use it. Further, by providing a build script for frameworks libraries, we make it easy to setup a whoel build environment that can directly be used for porting KDE applications that use KF5.
  2. Make development easy for new people.
    Initial work was started to create a docker image as a simple to use SDK. The goal is: run one command and get a fully setup build environment. With this approach we follow the way as it was started for Plasma Mobile.
  3. Availability of KDE Frameworks 5.
    We started to look into which frameworks currently can be built for Android. The list is already notably: kconfig, kcompletion, kitemmodelsm kitemviews, kcodecs, karchive, kguiaddons, kwidgetsaddons, attica, kdnssd, kapidox, kimageformats, and kplotting. For getting more frameworks build, the current two major blockers are building ki18n and kcoreaddons, which both need actual changes to the code to support the Android platform with its stripped down libc.

Looking at what was already achieved, the sprint itself was essential to get all people together to really work on this topic. As always, you can support this wok by donating for KDE’s sprints.
Though the work is not yet done, the basement is set to post some interesting news in the next weeks.

Akademy Impressions

What a cool time! I am still thrilled, now two days home after Akademy and QtCS, which took place the last week in Bilbao. Several great reports about what happened there and what was discussed can already be found in the net:

To avoid repeating everything, I just want to point out my personal highlights:

  • Björn and Thomas proposed in their talk a new UI concept: the “Flow”. Even if it is still a conceptual draft and it is unclear how we can implement it, I am really fascinated by this idea! Already on the way towards implementing this concept, we must rethink what our application boundaries are, where and how applications can interact and how can we create a user experience without distracting the user from what she/he wants to do.
  • Kevin announced Declarative Widgets that will significantly simplify transitions to QML2 based interfaces for applications that want to use QML but are currently still tight to Qt 4.8.
  • Peter had a great talk about Simon Speech Recognition: It is awesome what Peter is doing with his speech recognition system and what Simon offers. After several discussions, now I am also planning to utilize the Simon-library for recordings inside Artikulate (by this we directly solve our build problems on several platforms 🙂
  • It was awesome to meet so many faces and talking to so many people that I previously only knew from mail/IRC. Given that you even had even only a short chat with someone, in my opinion it makes talking at IRC so much more personal, since you really know who is sitting at the other end of the wire.
  • Thanks to all the people who gave so much positive feedback about our “Artikulate Project” and especially to those who offered to contribute recordings! I also had a lot of discussions about the process of language learning and the interactions of our existing applications (several blog posts about this topic will follow). Special thanks go to Dimitris, who recently started to create a Greek language course!

And last but not least: Thanks to the local organizer team, you really made this Akademy rock!

Ein neuer Blog, ein erster Eintrag, ein paar Tage in Rom

Lang ist es her, dass ich meinen letzten Blog-Eintrag geschrieben habe. Ich glaube, es war damals, als wir in Oklahoma im Dezember 2006 diesen Blizzard hatten und wir, also Sanja, Tanja Sebastian und ich, den Yakuzi im Schnee ausprobiert haben… Nunja, mein alter Blog hat damals irgendwann einmal das Zeitliche gesegnet und ne Sicherheitskopie hatte ich irgendwie vergessen; schauen wir mal wie es diesem ergeht.



Seit Donnerstag früh, genauer gesagt seit 6:23 Uhr, bin ich wieder auf Reisen durch Deutschland und Europa. Nachdem ich letzte Woche schon auf der 38,5-ten KIF in Darmstadt war, geht es jetzt ein wenig weiter in den Süden, nach Rom, bevor ich dann kommende Woche weiter nach Magdeburg fliege um auch noch die 67-te KoMa mitzunehmen. Warum ich überhaupt in Rom bin ist eigentlich ganz einfach. Denn ab Dienstag findet hier der FRONTS Experiment-Workshop statt an dem ich ab Dienstag teilnehme und für den dann auch noch weitere Paderborner hier in Rom auftauchen. Und da ich derzeit sowieso (quasi) nur darauf warte, dass es Dezember wird und ich wieder was zu tun habe, passte es wunderbar in meine Monatsplanung, ein paar Tage Urlaub in Rom einzuplanen.

Beim Tippen dieser Zeilen sitze ich nun in einem kleinen B&B südwestlich der Statione Termini mit dem netten Namen "Family House". Ganz familiär ist es hier auch, günstig außerdem und es hat sogar eine deutlich höhere Qualität als damals das White House Hostel of New York (vielleicht erinnert sich noch der ein oder andere daran? hier haben die Zimmer sogar Decken und ich habe ein eigenes (sic!) Bad). Einziges, ganz wesentliches Problem: hier gebt es kein Internet im Hause. Und in nächstgelegenen Internetcafe kann ich nur mit deren Rechnern ins Netz, aber nicht mit meinem Laptop. Ergebnis wird also eine 4-5 tägige Internetabstinenz sein, wie ich sie schon seit Jahren nicht mehr erlebt habe. Aber an sich, vielleicht ist das auch wieder eine ganz neue Erfahrung: denn wie ist es eigentlich, wenn man nicht 10h/Tag seine Mails überprüft, bei Wortunsicherheiten gleich dictionary.com oder für besonders elegante Formulierungen sofort auf thesaurus.com nachschlägt. Vielleicht ist so etwas auch wieder ein wenig Training für das Gehirn. Vielleicht ist diese Erfahrung dann ja auch mal einen Post wert.

Doch ein wesentliches Problem bleibt: was ich eigentlich angedacht und dem ein oder anderen sogar versprochen hatte, einen tagesaktuellen Blog, mit Fotos und solchen Dingen, zu schreiben, das wird wohl nicht funktionieren. Aber vielleicht ist das auch garnicht so schlimm. Den Plan, den ich mir nun alternativ gemacht habe, ist es die ganzen Erlebnisse aus Rom ein wenig thematisch zu sortieren und jeweils einen Post zu einem solchen Thema zu schreiben. Derzeit schweben mir Themen wie "Essen und Trinken", "Regen, nicht nur ein Paderborner Phänomen" oder auch "War ich da nicht schon drei Mal?!" vor. Mal gucken was die Tage so alles bringen… Die Ideen zusammenschreiben kann ich so schon abends im Hostel machen und wenn ich nächste Woche tatsächlich im nächsten Hotel Internet habe, dann kann ich dann auch einen nach dem anderen veröffentlichen.