I have a file formatted like this:
train/t/temple/east_asia/00000025.jpg 94
train/t/temple/east_asia/00000865.jpg 94
...
train/s/swamp/00000560.jpg 92
train/s/swamp/00000935.jpg 92
....
train/m/mountain/00000428.jpg 68
train/m/mountain/00000126.jpg 68
The last number is the class number. I have 50 different classes, and each class has 1,000 lines. I would like to take a random sample of size N from each class, and store the result in another text file.
Best Answer
Since your lines are grouped by class, you could (with
gnu
tools)split
the file into pieces and use the--fiter
option to pipe each piece toshuf
to extract N random lines from it:Note that
split
defaults to 1000 lines - which is what you need in this particular case. If the requirements change you'll have to pass the number of lines via-l
e.g. to split into pieces of 200 lines and extract 30 random lines from each piece: