Friday, 24 April 2015

Doing quantitative archaeology with open source software

This short post is written for archaeologists who frequently perform common data analysis and visualisation tasks in Excel, SPSS or similar commercial packages. It was motivated by my recent observations at the Society of American Archaeology meeting in San Francisco - the largest annual meeting of archaeologists in the world - where I noticed that the great majority of archaeologists use Excel and SPSS. I wrote this post to describe why those packages might not be the best choices, and explain what one good alternative might be. There’s nothing specifically about archaeology in here, so this post will likely to be relevant to researchers in the social sciences in general. It’s also cross-posted on the Software Sustainability Institute blog.

Prevailing tools for data analysis and visualization in archaeology have severe limitations

For many archaeologists, the standard tools for any kind of quantitative analysis include Microsoft Excel, SPSS, and for more exotic methods, PAST. While these software are widely used, there are a few limitations that are obvious to anyone who has worked with them for a long time, and raise the question about what alternatives are available. Here are three key limitations:
  • File formats: each program has its own proprietary format, and while there is some interoperability between them, we cannot open their files in any program that we wish. And because these formats are controlled by companies rather than a community of researchers, we have no guarantee that the Excel or SPSS file format of today will be readable by any software 10 or 20 years from now. 
  • Click-trails: the main interaction with these programs is by using the mouse the point and click on menus, windows, buttons and so on. These mouse actions are ephemeral and unrecorded, so that many of the choices made during a quantitative analysis in Excel are undocumented. When a researcher wants to retrace the steps of their workflow days, months or years after the original effort, they are dependent on their memory or some external record of many of the choices made in an analysis. This can make it very difficult for another person to understand how an analysis was conducted because many of the details are not recorded. 
  • Black boxes: the algorithms that these programs use for generating results are not available for convenient inspection to the researcher. The programs are a classic black box, where data and settings go it, and a result comes out, as if by magic. For moderately complicated computations, this can make it difficult for the researcher to interpret their results, since they do not have access to all of the details of the computation. This black box design also limits the extent to which the researcher can customise or extend built-in methods to new applications.
How to overcome these limitations?

For a long time archaeologists had few options to deal with these problems because there were few alternative programs. The general alternative to using a point-and-click program is writing scripts to program algorithms for statistical analysis and visualisations. Writing scripts means that the data analysis workflow is documented and preserved, so it can be revisited in the future and distributed to others for them to inspect, reuse or extend. For many years this was only possible using ubiquitous but low-level computer languages such as C or Fortran (or exotic higher level languages such as S), which required a substantial investment of time and effort, and a robust knowledge of computer science. In recent years, however, there has been a convergence of developments that have dramatically increased the ease of using a high level programming language, specifically R, to write scripts to do statistical analysis and visualisations. As an open source programming language with special strengths in statistical analysis and visualisations, R has the potential to be a solution to the three problems of using software such as Excel and SPSS. Open source means that all of the code and algorithms that make the program operate are available for inspection and reuse, so that there is nothing hidden from the user about how the program operates (and the user is free to alter their copy of the program in any way they like, for example, to increase computation speed).

Three reasons why R has become easier to use

Although R was first released in 1993, it has only been in the last five years or so that it has really become accessible and a viable option for archaeologists. Until recently, only researchers steeped in computer science and fluent in other programming languages could make effective use of R. Now the barriers to getting started with R are very low, and archaeologists without any background with computers and programming can quickly get to a point where they can do useful work with R. There are three factors that are relevant to the recent increase in the usability of R, and that any new user should take advantage of:
  • the release of an Integrated Development Environment, RStudio, especially for R
  • the shift toward more user-friendly idioms of the language resulting from the prolific contributions of Hadley Wickham, and 
  • the massive growth of an active online community of users and developers from all disciplines.
1. RStudio

For the beginner user of R, the free and open source program RStudio is by far the easiest way to quickly get to the point of doing useful work. First released in 2011, it has numerous conveniences that simplify writing and running code, and handling the output. Before RStudio, an R user had little more than a blinking command line prompt to work with, and might struggle for some time to identify efficient methods for getting data in, run code (especially if more than a few lines) and then get data and plots out for use in reports, etc. With RStudio, the barriers to doing these things are lowered substantially. The biggest help is having a text editor right next to the R console. The text editor is like a plain text editor (such as Notepad on Windows), but has many features to help with writing code. For example, it is code-aware and automatically colours the text to make it a lot easier to read (functions are one colour, objects another, etc.). The code editor has comprehensive auto-complete feature that shows suggested options while you type, and gives in-context access to the help documentation. This makes spelling mistakes rare when writing code, which is very helpful. There is a plot pane for viewing visualisations and buttons for saving them in various formats, and a workspace pane for inspecting data objects that you've created. These kinds of features lower the cognitive burden to working with a programming language, and make it easier to be productive with a limited knowledge of the language.

2. The Hadleyverse

A second recent development that makes it easier for a new user to be productive using R is a set of contributed packages affectionately known in the R user community as the Hadleyverse. User contributed packages are add-on modules that extend the functionality of base R. Base R is what you get when you download R from, and while it is a complete programming language, the 6000-odd user contributed packages provide ready-made functions for a vast range of data analysis and visualization tasks. Because the large number of packages can make discovering relevant ones challenges, they have been organised into 'task views' that list packages relevant to specific areas of analysis. There is a task view for archaeology, providing an annotated list of R packages useful for archaeological research. Among these user-contributed packages are a set by Hadley Wickham (Chief Scientist at RStudio and adjunct Professor at Rice University) and his collaborators that make plotting better, simplify common data analysis activities, speed up importing data in R (including from Excel and SPSS files), and improve many other common tasks. The overall result is that for many people, programming in R is shifting from the base R idioms to a new set of idioms enabled by Wickham's packages. This is an advantage for the new user of R because writing code with Wickham's packages results in code that is easier to read by people, as well as being highly efficient to compute. This is because it simplifies many common tasks (so the user doesn't have to specify exotic options if they don't want to), uses common English verbs ('filter', 'arrange', etc.), and uses pipes. Pipes mean that functions are written one after the other, following the order they would appear in when you explain the code to another person in conversation. This is different from the base R idiom, which doesn't have pipes and instead has functions nested inside each other, requiring them to be read from the center (or inside of the nest) to the left (outside of the nest), and use temporary objects, which is a counter-intuitive flow for most people new to programming.

3. Big open online communities of users

A third major factor in the improved accessibility of R to new users is the growth of an active online communities of R users. There has long been an email list for R users, but more recently, user communities have former around websites such as Stackoverflow. Stackoverflow is a free question-and-answer website for programmers using any language. The unique concept is that it gamifies the process of asking and answering questions, so that if you ask a good question (ie. well-described, includes a small self-contained example of the code that is causing the problem), other users can reward your effort by upvoting your question. High quality questions can attract very quick answers, because of the size of the community active on the site. Similarly, if you post a high-quality answer to someone else's question, other users can recognise this by upvoting your answer. These voting processes make the site very useful even for the casual R user searching for answers (and who may not care for voting), because they can identify the high-quality answers by the number of votes they've received. It's often the case that if you copy and paste an error message from the R console into the google search box, the first few results will be Q&A pages on Stackoverflow. This is very different experience compared to using the r-help email list, where help can come slowly, if at all, and searching the email list, where it's not always clear which is the best solution. Another useful output from the online community of R users are blogs that document how to conduct various analyses or produce visualizations (some 500 blogs are aggregated at The key advantage to Stackoverflow and blogs, aside from their free availability, is that they very frequently include enough code for the casual user to reproduce the described results. They are like a method exchange, where you can collect a method in the form of someone else's code, and adapt it to suit your own research workflow.

There's no obvious single explanation for the growth of this online community of R users. Contributing factors might include a shift from SAS (a commercial product with licensing fees) to R as the software to teach students with in many academic departments, due to the Global Financial Crisis of 2008 that forced budget reductions at many universities. This led to a greater proportion of recent generations of graduates being R users. The flexibility of R as a data analysis tool, combined with  rise of data science as an attractive career path, and demand for data mining skills in the private sector may also have contributed to the convergence of people who are active online that are also R users, since so many of the user contributed packages are focused on statistical analyses.

So What?

The prevailing programs used for statistical analyses in archaeology have severe limitations resulting from their corporate origins (proprietary file formats, uninspectable algorithms) and mouse-driven interfaces (impeding reproducibility). The generic solution is an open source programming language with tools for handling diverse file types and a wide range of statistical and visualization functions. In recent years R has become the a very prominent and widely used language that fulfills these criteria. Here I have briefly described three recent developments that have made R highly accessible to the new user, in the hope that archaeologists who are not yet using it might adopt it as more flexible and useful program for data analysis and visualization than their current tools. Of course it is quite likely that the popularity of R will rise and fall like many other programming languages, and ten years from now the fashionable choice may be Julia or something that hasn't even been invented yet. However, the general principle that a scripted analyses using an open source language is better for archaeologists, and science generally, will remain true regardless of the details of the specific language.

Wednesday, 22 April 2015

Rediscovering Ancient Identities in Kotayk (Armenia)

Since some years crowdfunding has become a new resource in archeology, providing support to those projects which have difficulties in financing the many research activities connected with historical investigations in general.
Despite our team has not yet tested the true potential of this system, today I would like to help some colleagues and friends who decided to experiment this way of funding for their expedition in the Kotayk region (Armenia). 
Their mission started in summer 2013 and tries "to register and study all the archaeological sites along the upper Hrazdan river basin, in the Armenian province of Kotayk. The project is organized by the Institute of Archaeology and Ethnography of the Academy of Sciences of the Republic of Armenia, the International Association of Mediterranean and Oriental Studies (ISMEO) and the Italian Foreign Affairs Minister." Up to now the team achieved some remarkable results, locating 56 historical/archaeological sites and starting an excavation in the well preserved iron age fortress of Solak.
If you want to support their effort in recording and analyzing archaeological evidences in the Kotayk region, you can find more details in their official Indiegoo page.
I personally met most of the team members (Dr. +Manuel Castelluccia, Dr. Roberto Dan and Dr. +Riccardo La Farina) between 2010 and 2011, when they joined (between 2010 and 2011) the missions of Aramus (Armenia) and Khovle Gora (Georgia), in which I was working with Arc-Team for the Institut für Alte Geschichte und Altorientalistik

The visit to the city of Vardzia, during the mission in Khovle Gora (2011)

There I could appreciate their commitment and professionalism. For this reason I wish a very successful 2015 mission for the Kotayk Survey Project, hoping to get soon some feedbacks from this interesting project also here in ATOR!

A moment of relax during the mission in Khovle Gora (2011)

Monday, 6 April 2015

Arcaheological Forensic Facial Reconstruction with FLOSS

Last week the CAA conference 2015 (Computer Applications and Quantitative Methods in Archaeology) took place in Siena (Italy). It has been a good occasion to meet old friends, share opinions and speak with colleagues from all over the world.
This year Arc-Team participated with three oral presentations and a poster and, of course, we will share these contributions with free licenses (CC-BY) also in ATOR.
Today I upload the poster, which is self-explanatory, thanks to the text added to summarize our experience with Archaeological Forensic Facial Reconstruction (FFR). If you are a regular reader of ATOR, there will be few news for you about our work, but you will find some extra contents which we had not yet time to share through our blog (e.g. a gallery of some of the reconstructions for the open source exhibition "Facce. I molti volti della storia umana"; the video of the FFR of St. Anthony, presented during the "Giugno Antoniano", or the Mocap experiment with Franceso Petrarca).

Here is the poster, I hope you will find it useful:

Poster at CAA 2015 (Siena - Italy)
Have a nice day!

Sunday, 29 March 2015

The horizons of the exhibit “FACES”: anthropological context and applications in medicine

The exhibition FACES. The Many Visages of Human History is, in its own way, a landmark of the work that Arc Team, the Museum of Anthropology at the University of Padua and Antrocom NPO are making together.

The reconstructions of the faces of hominins; of St. Anthony and of  the Blessed Luca Belludi; of Francesco Petrarca and of Giambattista Morgagni, are the evidence of a research that lasted for months and that continues nowadays; a research intended to be expanded to other areas of interest.

In fact, the exhibition offers to the visitors the opportunity to reflect on concepts meaningful to anthropology as diversity, self-perception and identity from the point of view both historical and  contemporary, but it is also a mirror of a continuous testing of technologies that open new perspectives in different areas of anthropological research.

Staying in the wake of the topics of the exhibition, there is no doubt that the perception of the self and diversity are important parameters in the assessments of medical anthropology, especially if the feedback on them are carried out in the light of the implementation of new technologies and 3D printing, in particular applied in medicine.

For example, the prostheses that can be constructed, even printed, in a relatively short time and custom-made for the patient. We have a lot of examples from this point of view:  the mandible custom-made for a 83 years old woman or the cranium completely replaced in a 22 years old Dutch patient; or the realization of live organs, such as liver, tracheal cartilage and ear directly using living cells.

More, forensic reconstruction is a valuable tool in reconstructive surgery: examples of implementation, in this context, are the reconstruction of the face of Albert of Trento via open source software, or of the face of a child mummy preserved at the Saint Louis Art Museum.

I shall focus in particular on an implementation made by Cicero Moraes in order to treat the developmental dysplasia of the hip (DDH), a neonatal congenital malformation and treatable using Pavlik harness or making a particular plaster cast (hip spica cast). There may be, in severe cases, even a type of orthopedic surgery.

This treatment involves a continuous monitoring of the patient because of its complications: pain, increased temperature, lesions of the skin. Moraes, together with researchers Munhoz, Kunkel and Tanaka, has implemented an alternative method to the common orthosis consisting in a photogrammetric scanning of the hip in order to replicate the perfect geometry of the anatomical part, with reduction of costs and time and avoiding complications to the patient.

The aims of our research are gradually expanding and we know that we have to do still a lot of work. It's a good thing, however, that we stopped for a moment to take stock of the situation and to recognize that we are helping to improve the state of affairs. Not only in archeology and anthropology, but also in other fields thanks to the scope of what we are doing. A result achieved thanks to motivated individuals who, despite residing in geographic areas far apart, have joined efforts to reach a common goal by sharing data and projects.

Monday, 16 March 2015

Arc-Team: Conflict Archaeology Workflow

Since 2011 Arc-Team is working on the field of modern conflict archaeology.
The most recent step was the conclusion of our European Project (Interreg IV), during which we had the possibility to document for the first time on lage area the military remains of both conflicting parties: The Austro-Hungarian Army and the Italian Royal Army.
We've filmed the single steps of archive studies, field work, data processing and tests for future applications of the collected data.
The result is a 4:22 minutes long video clip, which we want to share with you:

Soon we will post also some scientific details and results of the project.

Saturday, 7 March 2015

Pre-release of free Portuguese e-book: Manual of Digital 3D Facial Reconstruction

3D Designer Cicero Moraes (Arc-Team/Ebrafol), and Forensic Dentist Paulo Miamoto (Ebrafol) are working on the final stages of their e-book, "Manual of Digital 3D Facial Reconstruction - Applications with free software and open source". Although working for some years researching open software applied to forensic and archaelogical contexts, the duo reached international notoriety last year, when they took part in the facial approximation of Saint Anthony. The facial reconstruction of Mary Magdalene is their latest project that has drawn the media's attention.

Nowadays, Moraes and Miamoto are the coordinators of a non-profit NGO, the Brazilian Team of Forensic Anthropology and Dentistry (Ebrafol, for its acronym in Portuguese - The main objective of Ebrafol is the promotion of Human Rights by the application of the aforementioned sciences. They highlight that the tools used in historical projects like Mary Magdalene's are the same that can be applied to help human identification, and possibly contribute to solve cases of disappeared individuals.

Moraes and Miamoto now expect to share their methodology of work using free software in their e-book. "Unlike what most people may think at first sight, we actually want as many people as possible to learn all about our techniques and apply them to their needs, which might actually help people around them. And that's why the e-book will be free", explains Cicero. In this work, the reader will be taught from square one all about facial reconstruction, since digitalizing a skull with photogrammetry, up to rendering the modeled face. It is intended for absolute beginners, as Dr. Miamoto explains "I didn't have a solid background on 3D Modeling during my regular studies, it was a slow but constant learning process. Therefore it is important for us that this book is easily comprehensible by professionals of backgrounds other than computer sciences, like dentists, anthropologists, pathologists, archaeologists and anyone with a will to learn."  Open software like Python Photogrammetry Toolbox with Graphic User Interface (PPT-GUI), MeshLab, MakeHuman and Blender are some of the applications that are explained in tutorials throughout the chapters.

To receive your copy of the book, please go to, fill out the form and a copy will be sent soon, as the book is on its final reviewing phase. Although it is still in Portuguese, the authors expect to release translated versions to Spanish and English hereafter.

Tuesday, 3 March 2015

Project Tovel part 3: georeferencing historical maps

In many archaeological GIS a very important step is the study of historical maps. During the Project Tovel this stage has been a primary target, being strictly related to the 3D reconstruction of the underwater surface of the lake. In fact one of the best source for the bathymetry of Tovel is the plan drawn, between 1937 and 1938, by Edgardo Baldi (director of the "Istituto italiano di idrobiologia Dott. Marco De Marchi" of Pallanza, currently incorporated in the National Research CouncilInstitute of Ecosystem Study).
To Import Baldi's map into my GIS, I simply used the "georeferencer" tool of +QGIS, based on the related CTP (Carta Tecnica Provinciale) I loaded previously. The short videotutorial below describes this operation:

Have a nice day!
BlogItalia - La directory italiana dei blog Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.