Bash – How to split file name into variable

bashksh

Suppose I have a list of csv files with the following format:

INT_V1_<Product>_<ID>_<Name>_<ddmmyy>.csv
ASG_B1_V1_<Product>_<ID>_<Name>_<ddmmyy>.csv

The INT_V1_ & ASG_B1_V1_ is fixed, meaning all the csv files start with it.
How can I split the file names into variable?
For example, I wanted to capture the Name & assign it to a variable $Name.

Best Answer

With zsh:

file='INT_V1_<Product>_<ID>_<Name>_<ddmmyy>.csv'

setopt extendedglob
if [[ $file = (#b)*_(*)_(*)_(*)_(*).csv ]]; then
  product=$match[1] id=$match[2] name=$match[3] date=$match[4]
fi

With bash 4.3 or newer, ksh93t or newer or zsh in sh emulation (though in zsh, you'd rather simply do field=("${(@s:_:)field}") for splitting than using the split+glob non-sense operator of sh) you could split the string on _ characters and reference them from the end:

IFS=_
set -o noglob
field=($file) # split+glob  operator
date=${field[-1]%.*}
name=${field[-2]}
id=${field[-3]}
product=${field[-4]}

Or (bash 3.2 or newer):

if [[ $file =~ .*_(.*)_(.*)_(.*)_(.*)\.csv$ ]]; then
  product=${BASH_REMATCH[1]}
  id=${BASH_REMATCH[2]}
  name=${BASH_REMATCH[3]}
  date=${BASH_REMATCH[4]}
fi

(that assumes $file contains valid text in the current locale which is not guaranteed for file names unless you fix the locale to C or other locale with a single-byte per character charset).

Like zsh's * above, the .* is greedy. So the first one will eat as many *_ as possible, so the remaining .* will only match _-free strings.

With ksh93, you could do

pattern='*_(*)_(*)_(*)_(*).csv'
product=${file//$pattern/\1}
id=${file//$pattern/\2}
name=${file//$pattern/\3}
date=${file//$pattern/\4}

In a POSIX sh script, you could use the ${var#pattern}, ${var%pattern} standard parameter expansion operators:

rest=${file%.*} # remove .csv suffix
date=${rest##*_} # remove everything on the left up to the rightmost _
rest=${rest%_*} # remove one _* from the right
name=${rest##*_}
rest=${rest%_*}
id=${rest##*_}
rest=${rest%_*}
product=${rest##*_}

Or use the split+glob operator again:

IFS=_
set -o noglob
set -- $file
shift "$(($# - 4))"
product=$1 id=$2 name=$3 date=${4%.*}
Related Question