I'll answer your question in an orthodox manner, with a twist of heresy:
Orthodoxy: You shouldn't store data in a column in a relational database that isn't "atomic."
Heresy: In your specific situation, you could -- maybe -- consider this blob of JSON to be an atomic object.
Years ago, Chris Date said it like this:
"A relation is said to be in first normal form (abbreviated 1NF) if and only if it satisfies the condition that it contains scalar values only"
Date, C.J. An Introduction to Database Systems, 6th edition (Addison-Wesley, 1995)
Later, he took a somewhat softer stance:
"1NF just means each tuple in the relation contains exactly one value, of the appropriate type, for each attribute. Observe in particular that 1NF places no limitations on what those attribute types are allowed to be."
Date, C. J. Database Design and Relational Theory: Normal Forms and All That Jazz (OReilly Media, 2012)
The "exactly one value" I'm arguing for here is "exactly one JSON object" (which could, in turn contain a JSON array).
Storing things in JSON in a column is a bad idea if you need to the DBMS to manipulate it in any way, since, of course, it can't be properly indexed like properly normalized data can be... but, arguably, if you really really really don't intend for the DBMS to do anything with what you're storing other than write and read it, the case could be made to store a chunk of JSON in a single column, claiming the JSON array of values to be a single atomic value.
The big objection, I think, to doing this, is when it's done out of a lack of familiarity with the relational model or out of laziness or naivete. Obviously, there are a lot of ways it could be done wrong, but I'd suggest that there's nothing inherently wrong about storing a chunk of JSON in a database, As Long As You Know What You're Doing.™
And, of course, you could use a MySQL FULLTEXT
index on it, now that those are supported in InnoDB
as of (MySQL 5.6).
Let's break this question up into a few parts.
Q: I need to insert 1mm rows a day. Is that a lot?
Not really. 1mm divided by 24 hours divided by 60 minutes divided by 60 seconds gives you about 12 inserts per second. For a rough frame of perspective, it's not unusual to see 1,000 inserts per second in typical commodity servers with no tuning.
Granted, your load won't be perfectly averaged out like that - you'll have bursts of load - but I wouldn't make database platform decisions based on less than 10k-20k inserts per second. Any platform out there will work fairly well.
Q: How should I structure the data?
Zoom out - don't think table, think databases. If you're going to be keeping this data permanently, and it's truly insert-only with no updates, then you probably want to start a new database for time lengths. Your inserts may only go into one table in one database, but every year, create a new database (MyApp_2015) and seal the old 2014 data as read-only. You can stop backing it up (as long as you've still got a good backup once), stop doing index maintenance, statistics updates, etc.
The PHP will only ever have to know about the current database for inserts, making your design a lot easier. The archival process becomes a DBA task much later down the road as long as you go in knowing that there will be more than one database involved.
If you were doing more than 1,000 inserts per second sustained, and you wanted easier performance management, then I'd also suggest building sharding into the initial design regardless of the database platform. Don't get me wrong, any modern database can handle over 1,000 inserts per second, but designing sharding in now just gives you more flexibility later on. At 12 inserts per second, it's just not worth the design/testing hassle.
Q: How should I do reporting?
In an ideal world, reports would not be done against the live server. Run the reports against a restored or replicated copy of the database. This does two things: it reduces load on the live server, and it validates your backups, guaranteeing that you've got your valuable data elsewhere.
Best Answer
Two things.
Use the tool you already know. This simplifies development, and reduces the desire to prematurely optimize.
Fit your environment to the requirements. The hardware required to store 400,000 x 10 byte rows per day is vastly different from 400,000 x 10,000 byte rows.
Define your requirements by mocking up some sample data in MySQL, along with a sample of the queries you'll run. See how much space this sample takes up, and do the math to determine the requirements for your data over a 1 month, a year, and so-forth. This will give you a much better idea of the kind of hardware you'll need.
My answer talks about the requirements for storing billions of rows in SQL Server and shows some of the things to consider. The other answers on that question are also quite instructive. As I note in my answer, don't store an IP address as a varchar. Use an integer.