Teradata to drive responsive web application

teradata

Does anyone have experience using Teradata to drive a typical, responsive client-side web application? What is your evaluation? Specifically, how well does it perform in single-read and single-write?

Single-read: The total time-to-first-byte between a request being sent and a response being received to read a single document or row that contains meaningful information for a webapp. For example, consider a table called profiles with 100,000 records in which various attributes are stored on each row. How much time does it take to get a single profile up to a client-side webapp by querying on a primary key? How much time does it take to do the same operation, querying on a non-indexed field?

Single-write: How much time does it take to insert a new row or update an existing row. Assume that all indexes that are needed to perform the single-read quickly need to be written during a single-write operation.

If you have built a client-side web application with a Teradata backend, what sort of challenges did you face? I have seen Teradata as primarily an analytical and data warehousing tool rather than an operational database, but I am being asked to investigate Teradata as the backend for an operational web application.

I would like to find out if anyone has tried this, and whether it could keep up with more commonly used databases for webapps. I'm not looking for opinions, but for honest evaluations of how the technology works in this unusual use-case by people who have tested it, and for people's experiences using Teradata to drive a responsive webapp.

I understand there are many variables here, but I'm looking for real-world examples of Teradata being used as a backend for a client-side webapp, and how it performs in practice. Thus I'm more interested in TTFB than I am in DBS time, though a thorough breakdown of how long each piece of a transaction takes would be appreciated.

My previous question, Teradata REST API performance metrics, was specifically about Teradata REST performance. However, I've realized that almost no one uses Teradata's REST capabilities in this community (questions have gone unanswered for years).

My particular use case

Insac requested that I share some details of my specific use-case to help focus answers. With the caveat that I'm looking for an answer about how Teradata performs in general for webapps, here is what my app will need to do:

  • Client-side webapp with RESTful interface
  • Estimated ~5,000 users
  • Would be driven by ~25 Terdata views and a handful of materialized tables
  • Teradata instance would be shared with a data warehouse
  • Some views would perform joins across several underlying views
  • The underlying tables for these views are structured for analytics
  • Almost all queries are read-only operations; certainly all intensive queries.
  • A typical page refresh would send off ~100 queries to Teradata after receiving a response from a few initial queries to dynamically generate SQL on the client side.
  • At a minimum, any user action would send off about 20 queries.
  • The web application needs to be responsive and fluid. Responses in less than 2 seconds are a requirement, with sub-second response-times greatly preferred.

A very high-level overview of the purpose is accessing data from a social-network-like timeline feed history for a particular profile. The initial queries would return a chronologically sorted list of events that match basic filters such as category and date. This result set then would drive dynamically generated SQL queries for the first results, returning the content of each event on the feed.

I am evaluating Teradata for this application, but would like to know in general if Teradata is suited to client-side webapps so that I can support comparative analysis of database technologies for future projects as well.

Edit: The scope of my project is shifting to include more web-2.0 functionality, meaning the application needs to write to several views, and writes will be a common part of webapp use. With the read-only constraint removed, what would a webapp on Teradata look like?

Best Answer

as I said in my answer to this question, Teradata wouldn't be my first choice for a web backend: it really shines on moving/elaborating large quantities of data, but the typical workload of an application (frequent small queries) is not the best scenario for it.

Obviously there are other voices (see here) but there is quite an agreement about some critical points:

Teradata locking behavior is the primary challenge for an OLTP style application.

and:

use as much cache as you can: the Teradata connection is a precious resource you'd better not use for information that doesn't need to be realtime

It is clearly not impossible to implement such a solution on Teradata, but it requires your application to adapt fully to Teradata peculiarities (e.g. you could have to create a non-trivial application data model, to strive for single-AMP queries)

An example of the challenges you're going to face:

in an application I had to update a flag field on a set of rows in a table, chosen with a condition that was not the Teradata "primary index". However, there was an index on the columns used in the condition.

Usually, I would have used a simple "update": however, in Teradata, this would have created a "table-level lock". At the end it turned out it was "better" to select all the rows in the application and do all the updates (in batch) one by one by primary index, to use the "hash-level" locks.

=========================================================

Edited: after the additional info from OP

First the good news:

Mostly read-only since it is a mostly "read only" application, you can setup an efficient caching system. You could also "pre-cache" some common content so that the hit of first querying it won't be apparent to the user.

Why that's important in this case: connections in Teradata can't be created over a specific number for appliance. That's a pool wìthat will be shared between users in the application, the ETL and the normal OLAP users. In addition, queries that are not single-AMP won't have a very good latency, and they'll be more subject to the instan load of the system

Fine-grained queries

It might be a good thing if in this way you're able to do mostly single-AMP queries.

Why that's important in this case: single-AMP queries (the ones querying/insert/deleting using the primary key as criteria) have lower lock impact and can be managed without paying the cost of coordination between the AMPs. On the other side, having such a "chatty" protocol wil occupy more the connections.

Materialized tables

It's good that you are already thinking about materializing some data. You might even find out that it will be useful have the same data materialized in two or more tables with different primary indexes, to satisfy your user interface (e.g. if you have different selection criteria on the same entity). However this will have a cost in terms of ETL time and disk space.

Now, the bad news:

More than 1000 users As the application is not "completely" read-only, this number of users will probably create access conflict in the cases when they have to write.

Why that's a problem: Forget multi-versioning: here readers block writers and viceversa. Your select might stop the update of another user from completing. If you are not able to use primary index for every update you do, you'll be locking the full table, and even if you use primary index, you are not going to have a row-level lock (you have an "hash-level".. normally it works ok, but it can have some issues with batch inserts). There's a workaround this conflict betwen readers and writers (LOCK ROW FOR ACCESS) but it opens the door to dirty reads, and you should carefully think if you can accept this in your application

Web application and OLAP on the same machine

Independently from the underlying technology, it's always a challenge to accomodate these two different workload on the same platform.

Why that's a problem: Teradata has some workload management rules that can help in this field, however they need a careful tuning from DBAs and, however, extraordinary load (e.g. end of month batches) will likely have an impact on the user experience of the application

Hourly ETL

The hourly ETL will be a stress point for the application, since you'll be loading the same tables that you are querying. Since you cannot afford to lock the readers, you'll probably resort to "dirty reads", but you'll have to find some policy to manage the fact that the application could have temporarily inconsistent data.

Why that's a problem: Already said: table locks, dirty reads and the presence of two different workloads

Number of queries

I know I said that "finer grained queries" can be a good thing, but here we're talking of 100 queries per user for a page, and a minimum of 20 queries per user action. Now, if you execute them serially, you're increasing latency (and you're looking for sub-second response time); but if you parallelize them, you're occupying the equivalent of the maximum sessions of a Parsing Engine just for an user (I'm not considering caching here)

Disclaimer: I've been working last two years in more traditional Teradata projects, but when OLTP and raw latency counted I don't remember figures to go a lot less than the 150ms you mention. Maybe 130ms. But that was after optimizations and configuring the system to accommodate our workload, and just on the simpler queries/macros.

But, then, if you already have an instance, and you want to verify Teradata performance on this scenario, I'd suggest you to ask Teradata Sales support for a "Proof of Concept" on your environment. I think they'll gladly accept (the more load you have, the more likely you'll need additional nodes).

The reasons I wanted to know so many things about your application is that, at certain conditions (mostly read data, few active users, clearly defined batch and online windows), it is easy to create applications on Teradata (as on every other system). When some of these constraints are lifted, the lock architecture and the necessity to target single-AMP queries, makes it really difficult (not impossible, but really difficult) to achieve satisfying performances.