Available Applications

Physical Activity CoDa Regression Model

Investigate association between physical activity composition and a heath outcome directly (and obtain dose-response).


https://opencoda.shinyapps.io/CoDaLinearRegression/   

Physical Activity CoDa MANOVA

Categorise individuals by outcome and investigate how the categories differ in terms of the composition of physical activity.


https://opencoda.shinyapps.io/CoDaMANOVA/

Physical Activity CoDa Regression Model

This is our own tool developed in RStudio and made available through Shiny. It allows users to carry out standard linear regression analysis with compositional explanatory variables (and other standard covariates). The tool has been released recently, and is in an early stage of development, but feel free to give it a go, and let us know if you experience any problems.


We have a number of new features we would like to add to the tool time permitting (in particular producing ternary plots and modelling the results of isotemporal substitution on the overall population) in addition to improving the UI, but please let us know if you have any other suggestions.


The tool is freely available and we hope it is valuable to researchers in physical activity. Unfortunately we do not have the time and resources to provide full product support, or to carry out exhaustive product testing, and at this stage of development it is used on an "own-risk" basis. Nevertheless, we believe it will be a useful exploratory tool, and could provide an independent check on your own work.


( https://opencoda.shinyapps.io/physical_activity_coda_regression_model/ )

Brief Guidance on using PACRM

1. Read_Data


Data must be uploaded in .csv format. Any "data cleaning" should be done outside of the tool. 

In particular, the tool may not recognize all "n/a" indicators. We recommend such fields be left as blanks, or the records omitted entirely.


There are a few options on the lhs in respect of separators, decimal points, etc.


Based on user experience, we recommend short variables names, particularly for the compositional covariates where it is important that it is possible to distinguish between two compositional covariates by the first two letters of the variable name.


Compositional covariates should use a common scale, e.g. minutes, hours, proportion of the day, but it does not matter which.


Once the data is uploaded it will display on the right hand side of the screen. Depending on the size of your data it may be necessary to scroll right to see all of the fields.


Identify the response variable / outcome you are interested in on the bottom left. You can only consider one at a time.


If you just want to try out the tool without using your own data there is a simple dummy dataset below.


2. Covariates


The next stage is to identify the covariates / confounding variables you would like to include in the model. You will need to indicate continuous and categorical variables separately to enable the tool to identify variable types.

 

The tool includes some limited functionality for selecting covariates for inclusion (AIC, and likelihood ratio tests on the effect of dropping individual variables) displayed on the right. Changing the selected covariates will cause these calculations to update automatically.

 

3. ILR Coordinates

 

You should select the compositional variables in your data you would like to examine the composition of.


If there are zeroes in the data you will need to either remove them manually outside of the tool, or use the imputation functionality to replace the zeroes with conditionally (less than detection threshold) imputed values.


If you use the imputation functionality then select the detection limits by component, e.g.


  • if sleep time is reported to nearest hour and data is given in minutes then the detection limit is 0.5*60 = 30 (the smallest time that will be distinguished from zero);
  • if MVPA is measured on an accelerometer using 1 minute epochs, and data is given in proportion of the day, then the detection limit is 1/60/24 = 0.00069444;


Once you have identified the compositional variables, imputed as needed, and given the tool a little time to compute, it will provide a list of the available transformed CoDa variables for carrying out the regression.


These are calculated using the "robcompositions" package in R.


We have used a similar naming convention to robcompositions:


"_" indicates division, "." indicates geometric averaging, so for example "SB_MV.LI" would indicate the ilr coordinate is based on the logratio between "SB", and the geometric average of "MV" and "LI." The original variable names are often abbreviated in the ilr coordinate names.


Currently the tool has been built to handle 3 or 4 components. In addition, the list of ilr coordinates for 4 components is restricted to those built on the pivot coordinate approach. We hope to expand the functionality to higher dimensions, and a more general set of balances, in the near future.


The ilr coordinates for individual members are shown on the right hand side.


Basic model selection statistics (AIC, Likelihood Ratio Test p-values) for the complete compositional model with covariates are shown on the right hand side.


Regression coefficients are shown on the right hand side.


4. Diagnostics


Basic diagnostic plots and the coefficient of determination (as for standard linear regression models) are displayed here. No inputs.


5. Comparator/Null Hypothesis


On this tab you can compare two compositional linear regression models using a likelihood ratio test. Typically we would use this functionality to test inclusion / exclusion of multiple covariates, for example testing the significance of the covarites + compositional variables against just the covariates to determine the statistical significance of the overall composition.


6. Forecasting


It will display (given sufficient time) a ternary heat map of the expected outcomes forecast by the model. It will impose data points over the heat map if requested. You can adjust the covariates on the left hand side.


Due to the limitations of the ternary plot, you can only look at 3 compositional variables at one time. For 4d compositions, we recommend producing 4 ternary plots of each possible 3-d subcomposition.

Tutorial for Linear Regression Model

Tutorial for MANOVA Model

Check out this great video

Data Security / Privacy

Users will need to give consideration to the ethical issues associated with uploading their data before using these applications, and the appropriateness of additional anonymisation. We do not actively collect data uploaded onto the applications, however we are not able to give guarantees as to the duration that uploaded data will be held by our hosting service. 


We're happy to provide users interested in using these tools with concerns about uploading their data with the means to run these tools on a private server. Please contact us if this would be of interest.


The applications are hosted on shinyapps.io. The information below is taken from their user guide.


"Each app is deployed into its own container, and the network access between containers is tightly controlled. All access to the apps is over SSL....
 

...The design of the system is for every account to have its own sub-directory structure, and to enforce the security at the file system and operating system levels. The storage for each container is not permanent, so if you need to store data, our strong recommendation is for you to push that data into your own data store. That could be a database such as Amazon’s RDS, or it could be on a file system accessible from within your application.
 

shinyapps.io is currently hosted on Amazon’s Web Services (AWS) infrastructure in the us-east-1 region. The infrastructure used is not the HIPAA-compliant stack, so if you need to be in a HIPAA-compliant environment, we recommend deploying and operating your own Shiny Server or Shiny Server Professional instance."

Example Data for PACRM