Sequencing run processing

Metadata registration

Usage

find_and_register_project_metdata.py: [-h] -p PROJET_INFO_PATH -d DBCONFIG -t USER_ACCOUNT_TEMPLATE -n SLACK_CONFIG -u HPC_USER -a HPC_ADDRESS -l LDAP_SERVER [-h] [-s] [-c] [-i] [-m]

Parameters

-h, --help

: Show this help message and exit

-p, --projet_info_path

: Project metdata directory path

-d, --dbconfig

: Database configuration file path

-t, --user_account_template

: User account information email template file path

-s, --log_slack

: Toggle slack logging

-n, --slack_config

: Slack configuration file path

-c, --check_hpc_user

: Toggle HPC user checking

-u, --hpc_user

: HPC user name for ldap server checking

-a, --hpc_address

: HPC address for ldap server checking

-l, --ldap_server

: Ldap server address

-i, --setup_irods

: Setup iRODS account for user

-m, --notify_user

: Notify user about new account and password

Monitor sequencing run for demultiplexing

Usage

find_new_seqrun_and_prepare_md5.py
[-h] -p SEQRUN_PATH -m MD5_PATH -d DBCONFIG_PATH -s SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -n PIPELINE_NAME -j SAMPLESHEET_JSON_SCHEMA [-e EXCLUDE_PATH]

Parameters

-h, --help

: show this help message and exit

-p, --seqrun_path SEQRUN_PATH

: Seqrun directory path

-m, --md5_path MD5_PATH

: Seqrun md5 output dir

-d, --dbconfig_path DBCONFIG_PATH

: Database configuration json file

-s, --slack_config SLACK_CONFIG

: Slack configuration json file

-a, --asana_config ASANA_CONFIG

: Asana configuration json file

-i, --asana_project_id ASANA_PROJECT_ID

: Asana project id

-n, --pipeline_name PIPELINE_NAME

: IGF pipeline name

-j, --samplesheet_json_schema SAMPLESHEET_JSON_SCHEMA

: JSON schema for samplesheet validation

-e, --exclude_path EXCLUDE_PATH

: List of sub directories excluded from the search

Switch off project barcode checking

Usage

mark_project_barcode_check_off.py
[-h] -p PROJET_ID_LIST -d DBCONFIG [-s] -n SLACK_CONFIG

Parameters

-h, --help

: show this help message and exit

-p, --projet_id_list PROJET_ID_LIST

: A file path listing project_igf_id

-d, --dbconfig DBCONFIG

: Database configuration file path

-s, --log_slack

: Toggle slack logging

-n, --slack_config SLACK_CONFIG

: Slack configuration file path

Accept modified samplesheet for demultiplexing run

Usage

reset_samplesheet_for_pipeline.py
[-h] -p SEQRUN_PATH -d DBCONFIG -n SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -f INPUT_LIST

Parameters

-h, --help

: show this help message and exit

-p, --seqrun_path SEQRUN_PATH

: Sequencing run directory path

-d, --dbconfig DBCONFIG

: Database configuration file path

-n, --slack_config SLACK_CONFIG

: Slack configuration file path

-a, --asana_config ASANA_CONFIG

: Asana configuration file path

-i, --asana_project_id ASANA_PROJECT_ID

: Asana project id

-f, --input_list INPUT_LIST

: Sequencing run id list file

Copy files to temp directory for demultiplexing run

Usage

moveFilesForDemultiplexing.py
[-h] -i INPUT_DIR -o OUTPUT_DIR -s SAMPLESHEET_FILE -r RUNINFO_FILE

Parameters

-h, --help

: show this help message and exit

-i, --input_dir INPUT_DIR

: Input files directory

-o, --output_dir OUTPUT_DIR

: Output files directory

-s, --samplesheet_file SAMPLESHEET_FILE

: Illumina format samplesheet file

-r, --runinfo_file RUNINFO_FILE

: Illumina format RunInfo.xml file

Transfer metadata to experiment from sample entries

Usage

update_experiment_metadata_from_sample_attribute.py [-h] -d DBCONFIG -n SLACK_CONFIG

Parameters

-h, --help

show this help message and exit

-d, --dbconfig DBCONFIG

: Database configuration file path

-n, --slack_config SLACK_CONFIG

: Slack configuration file path

Pipeline control

Reset pipeline for data processing

Usage

batch_modify_pipeline_seed.py [-h] -t TABLE_NAME -p PIPELINE_NAME
-s SEED_STATUS -d DBCONFIG -n SLACK_CONFIG -a ASANA_CONFIG -i ASANA_PROJECT_ID -f INPUT_LIST

Parameters

-h, --help

: show this help message and exit

-t, --table_name TABLE_NAME

: Table name for igf id lookup

-p, --pipeline_name PIPELINE_NAME

: Pipeline name for seed modification

-s, --seed_status SEED_STATUS

: New seed status for pipeline_seed table

-d, --dbconfig DBCONFIG

: Database configuration file path

-n, --slack_config SLACK_CONFIG

: Slack configuration file path

-a, --asana_config ASANA_CONFIG

: Asana configuration file path

-i, --asana_project_id ASANA_PROJECT_ID

: Asana project id

-f, --input_list INPUT_LIST

: IGF id list file

Samplesheet processing

Divide samplesheet data

Usage

divide_samplesheet.py
[-h] -i SAMPLESHEET_FILE -d OUTPUT_DIR [-p]

Parameters

-h, --help

: show this help message and exit

-i, -samplesheet_file SAMPLESHEET_FILE : Illumina format samplesheet file -d, –output_dir OUTPUT_DIR : Output directory for writing samplesheet file -p, –print_stats : Print available stats for the samplesheet and exit

Reformat samplesheet for demultiplexing

Usage

reformatSampleSheet.py
[-h] -i SAMPLESHEET_FILE -f RUNINFOXML_FILE [-r] -o OUTPUT_FILE

Parameters

-h, --help

: show this help message and exit

-i, --samplesheet_file SAMPLESHEET_FILE

: Illumina format samplesheet file

-f, --runinfoxml_file RUNINFOXML_FILE

: Illumina RunInfo.xml file

-r, --revcomp_index

: Reverse complement HiSeq and NextSeq index2 column, default: True

-o, --output_file OUTPUT_FILE

: Reformatted samplesheet file

Calculate basesmask for demultiplexing

Usage

makeBasesMask.py
[-h] -s SAMPLESHEET_FILE -r RUNINFO_FILE [-a READ_OFFSET] [-b INDEX_OFFSET]

Parameters

-h, --help

: show this help message and exit

-s, --samplesheet_file SAMPLESHEET_FILE

: Illumina format samplesheet file

-r, --runinfo_file RUNINFO_FILE

: Illumina format RunInfo.xml file

-a, --read_offset READ_OFFSET

: Extra sequencing cycle for reads, default: 1

-b, --index_offset INDEX_OFFSET

: Extra sequencing cycle for index, default: 0

Create or modify data to database

Clean up data from existing database and create new tables

Usage

clean_and_rebuild_database.py
[-h] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help

: Show this help message and exit

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load flowcell runs to database

Usage

load_flowcell_rules_data.py
[-h] -f FLOWCELL_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help

: Show this help message and exit

-f, --flowcell_data

: Flowcell rules data json file

-u, --update

: Update existing flowcell rules data, default: False

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load pipeline configuration to database

Usage

load_pipeline_data.py
[-h] -p PIPELINE_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Paramaters

-h, --help

: Show this help message and exit

-p, --pipeline_data

: Pipeline data json file

-u, --update

: Update existing platform data, default: False

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load sequencing platform information to database

Usage

load_platform_data.py [-h] -p PLATFORM_DATA [-u] -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help

: Show this help message and exit

-p, --platform_data

: Platform data json file

-u, --update

: Update existing platform data, default: False

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load sequencing run information to database from a text input

Usage

load_seqrun_data.py [-h] -p SEQRUN_DATA -d DBCONFIG_PATH -s SLACK_CONFIG

Parameters

-h, --help

: Show this help message and exit

-p, --seqrun_data

: Seqrun data json file

-d, --dbconfig_path

: Database configuration json file

-s, --slack_config

: Slack configuration json file

Load file entries and build collection in database

Usage

load_files_collecion_to_db.py: [-h] -f COLLECTION_FILE_DATA -d DBCONFIG_PATH [-s]

Parameters

-h, --help

: show this help message and exit

-f, --collection_file_data COLLECTION_FILE_DATA

: Collection file data json file

-d, --dbconfig_path DBCONFIG_PATH

: Database configuration json file

-s, --calculate_checksum

: Toggle file checksum calculation

Check Storage utilisation

Calculate disk usage summary

Usage

calculate_disk_usage_summary.py
[-h] -p DISK_PATH [-c] [-r REMOTE_SERVER] -o OUTPUT_PATH

Parameters

-h, --help

: show this help message and exit

-p, --disk_path DISK_PATH

: List of disk path for summary calculation

-c, --copy_to_remoter

: Toggle file copy to remote server

-r, --remote_server REMOTE_SERVER

: Remote server address

-o, --output_path OUTPUT_PATH

: Output directory path

Calculate disk usage for a top level directory

Usage

calculate_sub_directory_usage.py
[-h] -p DIRECTORY_PATH [-c] [-r REMOTE_SERVER] -o OUTPUT_FILEPATH

Parameters

-h, --help

: show this help message and exit

-p, --directory_path DIRECTORY_PATH

: A directory path for sub directory lookup

-c, --copy_to_remoter

: Toggle file copy to remote server

-r, --remote_server REMOTE_SERVER

: Remote server address

-o, --output_filepath OUTPUT_FILEPATH

: Output gviz file path

Merge disk usage summary file and build a gviz json

Usage

merge_disk_usage_summary.py
[-h] -f CONFIG_FILE [-l LABEL_FILE] [-c] [-r REMOTE_SERVER] -o OUTPUT_FILEPATH

Parameters

-h, --help

: show this help message and exit

-f, --config_file CONFIG_FILE

: A configuration json file for disk usage summary

-l, --label_file LABEL_FILE

: A json file for disk label name

-c, --copy_to_remoter

: Toggle file copy to remote server

-r, --remote_server REMOTE_SERVER

: Remote server address

-o, --output_filepath OUTPUT_FILEPATH

: Output gviz file path

Seed analysis pipeline

A script for finding new experiment entries for seeding analysis pipeline

Usage

find_and_seed_new_analysis.py
[-h] -d DBCONFIG_PATH -s SLACK_CONFIG -p PIPELINE_NAME -t FASTQ_TYPE -f PROJECT_NAME_FILE [-m SPECIES_NAME] [-l LIBRARY_SOURCE]

Parameters

-h, --help

: show this help message and exit

-d , –dbconfig_path DBCONFIG_PATH : Database configuration json file -s , –slack_config SLACK_CONFIG : Slack configuration json file -p , –pipeline_name PIPELINE_NAME : IGF pipeline name -t , –fastq_type FASTQ_TYPE : Fastq collection type -f , –project_name_file PROJECT_NAME_FILE : File containing project names for seeding analysis pipeline -m , –species_name SPECIES_NAME : Species name to filter analysis -l , –library_source LIBRARY_SOURCE : Library source to filter analysis