Can I related the crash to availability mode settings?
No. By the looks of your log messages, you actually lost cluster quorum due to the removal of two nodes' votes, and then your witness (file share). Is this a 3 node cluster with a file share witness perchance? In that case, if you pulled these events from one of the node's event logs, then it may appear to each of the nodes that there is a lack of communication with all voters. That would generate a similar, if not same, error footprint like you have above. Nobody can talk to anybody, if that is the case.
During that time, the quorum will be lost as you are currently seeing. There is a level of assuming here, as I'd need to see way more diagnostic information to pinpoint the cause of voter removal, but that is why the quorum was lost.
Regardless, this appears to be a problem that surfaced in a down cluster, in which case your availability mode would have nothing to do with the WSFC cluster failing.
As for "best practices" for the availability mode to go with, you need to determine requirements for data loss, performance impact, and a few other factors that are best described in this BOL reference on Availability Modes.
1)How to revert back the AOAGs to original Synchronous mode -- Should I Perform a Forced Manual Failover of an Availability Group to PRODUCTION Servers(Sync) from DR/BCP (Asynchronous) node ?
You've failed over to asynchronous nodes. This means all of the database flows are paused and there is no current way (assuming it's a true disaster) how far behind your secondary replicas were. Now that they've come back up, we know that it isn't going to be 100% the same data (they are asynchronous). This was not mentioned in the question but I'm going to add it to the answer as it's extremely important as it's part of your SLA.
- Take a database backup of the original primary databases. To do this you will need to kick them out of the AG. Alternatively you could take a database snapshot (inside of SQL, not SAN).
- Resume data movement between the replicas at the database level.
- Set the local primary replica to synchronous and a remote replica to synchronous.
- Wait for them both to say 'Synchronized'. They will start out as 'Synchronizing'. This could take quite a while depending on multiple factors such as downtime, data generated, IO, Networking, etc.
- Once Synchronized, find a good time to take a minute or less outage. Use that time to do a planned manual failover.
- Set the local servers to be synchronous to each other.
- Remove synchronous from the remote replica and set it back to async.
2)We have one extra BCP node MSDTC and AOAGs Listener. Should I turn then online before I Perform a Forced Manual Failover of an Availability Group from PRODUCTION Servers(Sync) to DR/BCP (Asynchronous) node ?
How do you have an "extra AOAG listener" just chilling out? I don't understand that part of your question.
MSDTC is for distributed transactions, which aren't supported in 2012/2014 and only supported with certain restrictions (as of this moment) in 2016. Thus this is not required in a supported scenario. If you're going unsupported then this is still a moot point as the local MSDTC will be used.
Unless I'm missing something (completely possible) or not understanding, this is not needed.
Best Answer
Then you have quite a good bit of development work ahead of you. Seriously.
Everyone asks this same question, but it is not an easy one to answer with any more resolution than "It failed over at this time." Things such as virtualization make this even harder as the tools and the hypervisor itself may do things that are outside the purview of your sandbox.
However, to get you started, places to scrub for data:
Best of luck.
That's an organizational information sharing/Administration issue and not one you're going to have much help in dealing with through a simple SQL Server Alert. In fact, to do this properly, you'll want to take SQL Server out of the picture!
Most likely you're going to need to write a service that constantly consumes event logs, traces, error logs, etc., and then takes action when issues occur.