With C
omitting meaningful error messages:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[]) {
FILE *L;
FILE *F;
unsigned int to_print;
unsigned int current = 0;
char *line = NULL;
size_t len = 0;
if ((L = fopen(argv[1], "r")) == NULL) {
return 1;
} else if ((F = fopen(argv[2], "r")) == NULL) {
fclose(L);
return 1;
} else {
while (fscanf(L, "%u", &to_print) > 0) {
while (getline(&line, &len, F) != -1 && ++current != to_print);
if (current == to_print) {
printf("%s", line);
}
}
free(line);
fclose(L);
fclose(F);
return 0;
}
}
Some systems have a truncate
command that truncates files to a number of bytes (not characters).
I don't know of any that truncate to a number of characters, though you could resort to perl
which is installed by default on most systems:
perl
perl -Mopen=locale -ne '
BEGIN{$/ = \1234} truncate STDIN, tell STDIN; last' <> "$file"
With -Mopen=locale
, we use the locale's notion of what characters are (so in locales using the UTF-8 charset, that's UTF-8 encoded characters). Replace with -CS
if you want I/O to be decoded/encoded in UTF-8 regardless of the locale's charset.
$/ = \1234
: we set the record separator to a reference to an integer which is a way to specify records of fixed length (in number of characters).
then upon reading the first record, we truncate stdin in place (so at the end of the first record) and exit.
GNU sed
With GNU sed
, you could do (assuming the file doesn't contain NUL characters or sequences of bytes which don't form valid characters -- both of which should be true of text files):
sed -Ez -i -- 's/^(.{1234}).*/\1/' "$file"
But that's far less efficient, as it reads the file in full and stores it whole in memory, and writes a new copy.
GNU awk
Same with GNU awk
:
awk -i inplace -v RS='^$' -e '{printf "%s", substr($0, 1, 1234)}' -E /dev/null "$file"
-e code -E /dev/null "$file"
being one way to pass arbitrary file names to gawk
RS='^$'
: slurp mode.
Shell builtins
With ksh93
, bash
or zsh
(with shells other than zsh
, assuming the content doesn't contain NUL bytes):
content=$(cat < "$file" && echo .) &&
content=${content%.} &&
printf %s "${content:0:1234}" > "$file"
With zsh
:
read -k1234 -u0 s < $file &&
printf %s $s > $file
Or:
zmodload zsh/mapfile
mapfile[$file]=${mapfile[$file][1,1234]}
With ksh93
or bash
(beware it's bogus for multi-byte characters in several versions of bash
):
IFS= read -rN1234 s < "$file" &&
printf %s "$s" > "$file"
ksh93
can also truncate the file in place instead of rewriting it with its <>;
redirection operator:
IFS= read -rN1234 0<>; "$file"
iconv + head
To print the first 1234 characters, another option could be to convert to an encoding with a fixed number of bytes per character like UTF32BE
/UCS-4
:
iconv -t UCS-4 < "$file" | head -c "$((1234 * 4))" | iconv -f UCS-4
head -c
is not standard, but fairly common. A standard equivalent would be dd bs=1 count="$((1234 * 4))"
but would be less efficient, as it would read the input and write the output one byte at a time¹. iconv
is a standard command but the encoding names are not standardized, so you might find systems without UCS-4
Notes
In any case, though the output would have at most 1234 characters, it may end up not being valid text, as it would possibly end in a non-delimited line.
Also note that while while those solutions wouldn't cut text in the middle of a character, they could break it in the middle of a grapheme , like a é
expressed as U+0065 U+0301 (a e
followed by a combining acute accent), or Hangul syllable graphemes in their decomposed forms.
¹ and on pipe input you can't use bs
values other than 1 reliably unless you use the iflag=fullblock
GNU extension, as dd
could do short reads if it reads the pipe quicker than iconv
fills it
Best Answer
will count the bytes in the tenth line of
myfile
(including the linefeed/newline character).A slightly less readable variant,
(or
sed '10!d;q'
orsed '10q;d'
) will stop reading the file after the tenth line, which would be interesting on longer files (or streams). (Thanks to Tim Kennedy and Peter Cordes for the discussion leading to this.)There are performance comparisons of different ways of extracting lines of text in cat line X to line Y on a huge file.