3 Basic challenge
It’s now time to put into practice what you learned today! Try to solve the following challenge. The aim is to write a functioning workflow that takes in input the following sample sheet
filename,value,number_of_rows,number_of_columns
file1.txt,3,10,10
file2.txt,7,5,45
file3.txt,a_value,45,1
file4.txt,trweter,43,9
file5.txt,109,14,3
file6.txt,aaa,1,12
file7.txt,g,96,76
file8.txt,eew,11,11
file9.txt,1ww,21,34
file10.txt,45,8,2
file11.txt,jh,6,1
file12.txt,96,1,5
For each file in the sample sheet (under the filename
column), the workflow should create in output a file with that name and place it in the folder ../nextflow_output/challenge
.
Each file should contain the content of the column value
for that file.
This content should be repeated number_of_columns
times in the same line, each instance separated by a comma.
The file should contain the above-mentioned row repeated number_of_rows
times.
So in the end, each file should be essentially a CSV file with a number of rows equal to the value contained in the column number_of_rows
, and a number of columns equal to the value contained in the column number_of_columns
. Those files should be without a header, and each cell should contain the value stored in the column value
.
For example, ../nextflow_output/challenge/file1.txt
should contain
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
3,3,3,3,3,3,3,3,3,3
and ../nextflow_output/challenge/file6.txt
should contain
aaa,aaa,aaa,aaa,aaa,aaa,aaa,aaa,aaa,aaa,aaa,aaa
You can use as many or as few processes as you want to achieve this result. The challenge is possible using only the Nextflow features that we discussed, and a bit of bash or Python/R scripting.
You can find the solution to the challenge by disclosing the following block.
Click here to see the solution (main.nf
file)
.enable.dsl = 2
nextflow
{
process create_matrix "r-base r-tidyverse"
conda
"../nextflow_output/challenge"
publishDir
:
inputtuple(
val(outname),
val(fill_value),
val(nrows),
val(ncols)
)
:
output"$outname"
path
:
script"""
#!/usr/bin/env Rscript
library("tidyverse")
content <- rep("$fill_value", ${ncols}*${nrows})
mat <- matrix(content, ${nrows}, ${ncols})
df <- as.tibble(mat)
write_csv(df, "${outname}", col_names=FALSE)
"""
}
{
workflow Channel.fromPath( params.input )
.splitCsv(header: true)
.map{
[ it["filename"], it["value"], it["number_of_rows"], it["number_of_columns"] ]
}
.set{ input_ch }
create_matrix( input_ch )
}
If the sample sheet is saved as ../nextflow_output/samplesheet.csv
, the solution can be run with the following command
nextflow run main.nf --input ../nextflow_output/samplesheet.csv