english | deutsch

Content Analysis for Software Documentation

CADo: Content Analysis for Software Documentation is a research project in cooperation with the SWEVO Group (Prof. Robillard) at McGill University, Montreal, Canada.
The Project is funded, amongst others, by the Alexander von Humboldt Foundation.

CADo in Brief

Reading documentation is an important part of developing and managing software. For example, reference documentation complements Application Programming Interfaces (API) by providing information not obvious from the syntax of the API. To improve the quality of software documentation and the efficiency with which the relevant information it contains can be accessed, we must first understand its content.

This project includes a series of studies on the nature and organization of knowledge contained in the software documentation. The first phase is to study and compare thousands of API reference documentation provided as part of two major technology platforms: Java SDK 6 and .NET 4.0. The result include (a) a description of knowledge patterns based on grounded methods and independent empirical validation, (b) insights for the design of documentation retrieval systems, the improvement of documentation quality, and the management of technical knowledge, as well as (c) open source tools for systematically studying software documentation.

Content Analysis Tool

CADo is an open source tool to conduct content analysis for API-Documentation. CADo has 2 modes: the admin mode and coder mode. The main features of the admin mode include:

  • Extract API Documentation from web resources
  • Create random and stratified samples
  • Create a coding scheme
  • Manage coders
  • Create random assignments to coders
  • Calculate inter-coder’s agreement

The main features of the coder mode include:

  • Online and offline login
  • Load assignments
  • Add codes
  • See the coding guide
  • Render the documentation
  • Hibernate and resume coding sessions
  • See coding statistics

The figure below shows a screenshot of the CADo client used by coders. The single coding window includes a view of the documentation unit (A), containment and structural information about the associated element (B), 12 checkboxes corresponding to the 12 knowledge type variables (C), and a tool-tip window showing the description of a knowledge type extracted from the coding guide (D).


 Study Phases


Coding Guide

The coding guide can be downloaded here:

API Knowledge Coding Guide Version 7 .2

Data Sets



 Additional Material


This work has been made possible by the generous support of the Alexander von Humboldt Foundation.
We thank the coders, who participated in this study: Yam Chhetri, Vincenz Doelle, Aparna Halder, Zardosht Hodaie, Taha Koltukluoglu, Amel Mahmuzic, Afaq Mustafa, Hoda Naguib, Nitesh Narayan, Helmut Naughton, Dennis Pagano, Enrique Perez, Tobias Roehm, Blagina Simeonova, and Alexander Waldman. We also thank Yam Chhetri, Barthelemy Dagenais, Helmut Naughton, Dennis Pagano, Peter Rigby, and Gias Uddin for comments and suggestions.