How to construct the grep command on zsh to return a specific value

catalinacommand lineterminalzsh

I am trying to use the command line to extract a specific value from a web page.

The page is from the VS Code marketplace: https://marketplace.visualstudio.com/items?itemName=HCLTechnologies.hclappscancodesweep and I am after the number of installs.
I managed to get to a point where I can isolate up to this:

{"statisticName":"install","value":206.0},{"statisticName":"averagerating","value":5.0},{"statisticName":"ratingcount","value":8.0},{"statisticName":"trendingdaily","value":0.0},{"statisticName":"trendingmonthly","value":88.950276243093924},{"statisticName":"trendingweekly","value":28.7292817679558},{"statisticName":"weightedRating","value":4.668447071838485}],"installationTargets":[{"target":"Microsoft.VisualStudio.Code","targetVersion":""}],"deploymentType":0}

I am on a mac and using zsh. The command I'm currently using is:

curl -G "https://marketplace.visualstudio.com/items?itemName=HCLTechnologies.hclappscancodesweep" | grep -Eo {\"statisticName\":\"install\",\"value\":.+\}

The desired output of this command should be:
206.0 which is the number I'm after. If you can trim it down to 206 that's even better.

I have 2 main problems:
1) I can't isolate the match to start right before my number. I tried \K but that doesn't seem to work on the mac. I did not see an option to use groups either.

2) I can't make the matching after my number stop after the first match of a curly bracket }. Keep in mind the number can vary in length, so I want to stop it where the JSON stops.
I did actually try using [:digit:] and [0-9] to restrict to the number, but that did not work either and that's why I went down the curly bracket path.

I did see some options out there to work with the JSON part and extract things there, but the environment I'm working with does not have the flexibility of installing other utilities, so I need help to achieve this with a "vanilla" environment.

P.S. I'm working on Catalina 10.15.4

Best Answer

grep isn't well suited for data scrapping, you are better of with sed in such cases.

curl -sG 'https://marketplace.visualstudio.com/items?itemName=HCLTechnologies.hclappscancodesweep' \
    | sed -Ene 's/.*\"install\",\"value\":([0-9\.]+).*/\1/p'

The first part of the substition matches only once in the HTML file so there isn't even a need to select the correct line first.

To get the number without the decimal point use

curl -sG 'https://marketplace.visualstudio.com/items?itemName=HCLTechnologies.hclappscancodesweep' \
    | sed -Ene 's/.*\"install\",\"value\":([0-9]+).*/\1/p'