Determine how long tabs ‘\t’ are on a line

control-characterstext processing

In a text processing field is there a way to know if a tab is 8 characters in length (the default length) or less?

For example, if I have a sample file with tab delimiter and the content of a field fit in less than one tab (≤7), and if I have a tab after that, then that tab will be only ‘tab size – field size’ in length.

Is there a way to get the total length of tabs on a line? I'm not looking for the number of tabs (i.e. 10 tabs should not return 10) but the character length of those tabs.

For the following input data (tab delimited between fields and only one tab):

field0  field00 field000        last-field
fld1    fld11   fld001  last-fld
fd2     fld3    last-fld

I expect to count length of tabs in each line, so

11
9
9

Best Answer

The TAB character is a control character which when sent to a terminal¹ makes the terminal's cursor move to the next tab-stop. By default, in most terminals, the tab stops are 8 columns apart, but that's configurable.

You can also have tab stops at irregular intervals:

$ tabs 3 9 11; printf '\tx\ty\tz\n'
  x     y z

Only the terminal knows how many columns to the right a TAB will move the cursor.

You can get that information by querying the cursor position from the terminal before and after the tab has been sent.

If you want to make that calculation by hand for a given line and assuming that line is printed at the first column of the screen, you'll need to:

  • know where the tab-stops are²
  • know the display width of every character
  • know the width of the screen
  • decide whether you want to handle other control characters like \r (which moves the cursor to the first column) or \b that moves the cursor back...)

It can be simplified if you assume the tab stops are every 8 columns, the line fits in the screen and there are no other control characters or characters (or non-characters) that your terminal cannot display properly.

With GNU wc, if the line is stored in $line:

width=$(printf %s "$line" | wc -L)
width_without_tabs=$(printf %s "$line" | tr -d '\t' | wc -L)
width_of_tabs=$((width - width_without_tabs))

wc -L gives the width of the widest line in its input. It does that by using wcwidth(3) to determine the width of characters and assuming the tab stops are every 8 columns.

For non-GNU systems, and with the same assumptions, see @Kusalananda's approach. It's even better as it lets you specify the tab stops but unfortunately currently doesn't work with GNU expand (at least) when the input contains multi-byte characters or 0-width (like combining characters) or double-width characters.


¹ note though that if you do stty tab3, the tty device line discipline will take over the tab processing (convert TAB to spaces based on its own idea of where the cursor might be before sending to the terminal) and implement tab stops every 8 columns. Testing on Linux, it seems to handle properly CR, LF and BS characters as well as multibyte UTF-8 ones (provided iutf8 is also on) but that's about it. It assumes all other non-control characters (including zero-width, double-width characters) have a width of 1, it (obviously) doesn't handle escape sequences, doesn't wrap properly... That's probably intended for terminals that can't do tab processing.

In any case, the tty line discipline does need to know where the cursor is and uses those heuristics above, because when using the icanon line editor (like when you enter text for applications like cat that don't implement their own line editor), when you press TabBackspace, the line discipline needs to know how many BS characters to send to erase that Tab character for display. If you change where the tab stops are (like with tabs 12), you'll notice that Tabs are not erased properly. Same if you enter double-width characters before pressing TabBackspace.


² For that, you could send tab characters and query the cursor position after each one. Something like:

tabs=$(
  saved_settings=$(stty -g)
  stty -icanon min 1 time 0 -echo
  gawk -vRS=R -F';' -vORS= < /dev/tty '
    function out(s) {print s > "/dev/tty"; fflush("/dev/tty")}
    BEGIN{out("\r\t\33[6n")}
    $NF <= prev {out("\r"); exit}
    {print sep ($NF - 1); sep=","; prev = $NF; out("\t\33[6n")}'
  stty "$saved_settings"
)

Then, you can use that as expand -t "$tabs" using @Kusalananda's solution.

Related Question