Ubuntu – Bash script to edit excel files

bashcsvscriptsxls

I have multiple excel files in this format.
They are clock in and clock out date and time.

---------------------------------
| Name     | Time               |
---------------------------------
| Person A | 03-Jul-17 8:15 AM  |
| Person A | 03-Jul-17 10:32 AM |
| Person A | 03-Jul-17 1:56 PM  |
| Person A | 03-Jul-17 6:15 PM  |
| Person A | 04-Jul-17 8:29 AM  |
| Person A | 04-Jul-17 8:58 AM  |
| Person A | 04-Jul-17 9:43 AM  |
| Person A | 04-Jul-17 1:03 PM  |
| Person A | 04-Jul-17 2:17 PM  |
| Person A | 04-Jul-17 5:58 PM  |
.
.
.
| Person A | 31-Jul-17 7:45 AM  |
| Person A | 31-Jul-17 8:10 AM  |
| Person A | 31-Jul-17 3:26 PM  |
| Person A | 31-Jul-17 7:29 PM  |
---------------------------------

I would want to extract this data and save it as a new excel file in this format:

---------------------------------------------
| Name     | Date      | Time In | Time Out |
---------------------------------------------
| Person A | 03-Jul-17 | 8:15 AM | 6:15PM   |
| Person A | 04-Jul-17 | 8:29 AM | 5:58PM   |
.
.
.
| Person A | 31-Jul-17 | 7:45 AM | 7:29PM   |
---------------------------------------------

Basically it's to arrange the data one entry per date with the earliest time for that date as the Time In and the latest time for that date as the Time Out.

There are multiple excel files in this format, and doing it manually is going to take too long.

If you want to convert them to .csv first edit then convert them back to .xlsx, it's cool.

PS: Bounty of 200 rep up for grabs.

Best Answer

There are 2 scripts you need. The one to convert from XLS to CSV is a command xls2csv and the other one a script from github: csv2xls (another csv2xls). There are also csv2xlsx (and another csv2xlsx).

In between the 2 conversions you can edit the files using your favorite tool.

If you want to do it yourself: xlsx files (and the same applies ODT (open/libeoffice) are zipped archives and contain an XML with the data. You can unzip then and the data is in an XML. Manipulating the XML is a bit more difficult that a CSV sure but when the manipulating is automated it becomes rather efficient.

Related Question