I have a very long series of urls with no separating character, in the same format as below:
http://example.comhttp://example.nethttp://example.orghttp://etc...
I want each URL to be on a new line. I tried to do this by replacing all instances of "http://" with "\nhttp://" using sed
sed 's_http://_\nhttp://_g' urls.txt
but a segmentation fault occurs (memory violation). I can only surmise that the sheer size of the file (it's over 100GB) is causing sed to exceed some limit.
I could split the file into several smaller files for processing, but all instances of "http://" would need to be kept intact.
Is there a better way to do this?
Best Answer
With
awk
you can avoid reading huge amount of text at once:The success may depend on the used
awk
implementation. For examplegawk
works fine, butmawk
crashes.