Solve blog

Solve at useR! 2018

Tom Carmichael

Going to an open source, coding based conference which had a grand total of one geological themed talked over 4 days might have seemed like a dicey proposition for Solve Geosolutions, but useR! 2018 was worth the trip to Brisbane many times over (least of all as a respite from Melbourne’s weather).

The conference proper was preceded by a day of tutorials, concerning everything from best practices when imputing values (from the author of the excellent reference R for Statistics, Julie Josse) through to the easiest way to extend both xgboost and mxnet (run by Tong He, one of the original authors of the xgboost algorithm). The conference proper was excellent and covered just about every aspect of how R is being used to solve interesting problems – through to some of the strengths of the R community in general, and what the future for R will look like.

In absolutely no order whatsoever – a few of the favourite talks attended by the Solve team across the week were :

Enabling Analysts: Embracing R in a National Statistics Office – Chris Hansen, from Stats NZ spoke of making the business case from moving from Stata and SPSS to R, and the practicalities of making that move (including the idea of having a central R server that everyone uses, instead of local instances), the challenges that they faced when implementing R into existing workflows and how they addressed them.

clustree: a package for producing clustering trees using ggraph – Luke Zappia, presented a novel way to integrate the results from several different clustering solutions and use this to try to identify the optimum number of clusters for a given dataset.

DALEX will help you to understand your predictive model – Przemyslaw Biecek, from the Warsaw University of Technology, has produced a new package for interrogating predictive models called DALEX. This package is algorithm agnostic and aims to remove some of the mystery behind black box algorithms such as Random Forest and XGBoost. DALEX is particularly useful for investigating where a model struggles and where it excels.

Speeding up computations in R with parallel programming in the cloud – David Smith, foreach is an amazing R package that lets anything that you can do with a simple for loop run in parallel. Microsoft demonstrated an R package that makes it (almost) as simple to send off large jobs to an Azure cluster.

Maxcovr: Find the best locations for facilities using the maximal covering location problem – Nicholas Tierney,  Where should you put the next 50 WiFi hotspots to best service the public transport system? What about installing a few more AED resuscitation kits around a sporting complex? These are the questions that Nicholas Tierney’s fantastic maxcovr package can answer. It’s not too much of a stretch to see where these principles can be applied to mining and exploration.

And in an odd coincidence, in the middle of the last day of the conference, this article by the Sydney Morning Herald quotes CEO of Rio Tinto, Jean-Sebastien Jacques as saying:

 

“It is absolutely clear that technology, automation, artificial intelligence and digitalisation will play a more important role across industry, and it’s fair to say, in Australia today, it is difficult to find data scientists”

 

Which may, or may not be true, but having attended a conference where a cognitive psychologist has become a senior data scientist at Booking.com, should this be a model for the mining industry going forward?

If you’re interested in any of the talks that were given during useR! 2018, the talks list is here, and the vast majority have been made available to watch here.

Tom, Mark and Liam at #user2018.