To answer your questions in order:
- The cube doesn't store medians, modes (or even averages), but you can write queries that calculate them and embed them as calculated measures in the cube. The ability to embed this sort of computation is one of the main unique selling points of OLAP technology.
- If you have a dimension that can identify individual rows (which could be a degenerate or 'fact' dimension derived from an identifier on the fact table) then you can do a query based on individual rows. However, OLAP works in terms of dimensions and aggregates, so you would need to have a dimension capable of identifying individual rows (with an aggregate composed of one value).
- Any OLAP tool can do what's described in (2), plus they generally support a mechanism known as 'drill-through' where the cube will return a bordereaux of the transactional data underlying a given slice that you drill through into.
If you want to do calculations that aren't directly available in the cube script, many OLAP tools such as the late, lamented ProClarity will allow you to formulate queries involving custom MDX based calculations. Unless the cube doesn't have the information you need to do the actual calculations, the custom MDX calculations should be able to support any computation you need.
Although OLAP queries are traditionally associated with statistical queries in aggregate, if you have a dimension that allows drill down to detail you need it is certainly possible to formulate queries that will calculate medians, percentiles or histogram queries from which modes can be inferred or computed.
For example, this has an example of a pareto analysis query, which is based on rankings.
Many cube products can operate in a hybrid or relational OLAP mode where they do not persist the data themselves but query it from an underlying database. In addition, pure ROLAP tools such as Business Objects, Report Builder or Discoverer can query from an underlying database and do work row by row. However, they tend to lack the sophistication of dedicated OLAP products, and they don't have much in the way of statistical analysis capability out of the box.
SQL Server has JSON support though it seems to be a variation on it's XML support.
PostGres has had JSON support since 9.2.
Teradata has had it since 2014.
This makes them hybrids. Obviously PostGres is the open-source one.
It really depends on what you need from JSON support. If it is to return the document on a key then that has always been possible without explicit JSON support.
If you want to index certain parts of the JSON document then that does require specific JSON support. There is the question of how sparse the attribute is that you want to index. If it is present in the vast majority of cases (or even a mandatory attribute) then I would break it out of the JSON and have it as an explicit attribute in a hybrid store.
If by parallelize or die you mean support a distributed dataset or die then I'm not sure that I agree. For OLTP work you would have to be in a Times top 100 company to approach the limits of the traditional RDBMS. By definition most companies just don't generate as much useful data as they would like to think.
Companies like Facebook, Twitter and NetFlix are dealing with data on an order of magnitude far beyond what most people will see.
For high end web analytics work then yes, you might want a NOSQL product as a collector. Cassandra is useful in that respect, plus it has tunable consistency.
The gotchas in distributed systems is Brewers CAP theorem. You can have any two of Consistency, Availability or Partition tolerance. It's a bit more blurred than that but the 2 of 3 rule is generally true.
There's the challenge of how you handle referential integrity and in some cases how do you honour primary key constraints. Your data might be splashed across many servers so honouring a foreign key concept might require a lookup between one server and another. Even if this were a feature it would seriously impact performance.
If people want to interact with an API that only talks JSON then I see no problem with that providing there is an underlying DAL that obfuscates whether a data attribute is present in a JSON document or explicit attribute in a hybrid design.
Best Answer
OLAP cube stores data in an optimized way to provide a quick response to various types of complex queries by using dimensions and measures. Dimensions organize data with relation to an area of interest, such as customers, stores, or locations, time etc. Measure performs business-specific calculations to the required format. Both these drastically improves the performance of queries when come to reporting.
Advantages:
• High Speed of Data Processing
• Aggregated and Detailed Data
• Multidimensional Data Representation
Limitations:
• OLAP requires restructuring of data into a star/snowflake schema
• There is a limited number of dimensions (fields) a single OLAP cube
• It is nearly impossible to access transactional data in the OLAP cube
• Changes to an OLAP cube requires a full update of the cube
ROLAP and MOLAP are two models in OLAP. ROLAP creates a multidimensional view of data dynamically while MOLAP already stores the static multidimensional view of data in MDDBs.