First, Wikipedia's database scheme diagram is available here. I have used one of their dumps locally to work with it.
Now, I need a single MySQL query to check both page
and redirect
tables with a given input 'Dog_breeder' or 'Dog_breeding' for example. I want to get the list of both types of pages linked to this input.
For example, Dog_breeding is a main topic and the below pages will redirect to it:
- Dog_Breeder
- Dog_breeder
- Dog_breeders
- Stud_dog
In the database, this is stored in this fashion:
|---------------------------| |---------------------------|
| table: page | | table: redirect |
|---------------------------| |---------------------------|
| page_id | page_title | | rd_from | rd_title |
|---------------------------| |---------------------------|
| 476072 | Dog_breeding | | 796236 | Dog_breeding |
| 796236 | Dog_breeder | | 16274701 | Dog_breeding |
| 16274701 | Dog_breeders | | 37353772 | Dog_breeding |
| 37353772 | Stud_dog | | 40738352 | Dog_breeding |
| 40738352 | Dog_Breeder | |---------------------------|
| 11606599 | Dog_breeders |
| 22534134 | Dog_breeders |
|---------------------------|
So all pages are in the page
table and then redirections are stored in the redirect
table (rd_from
is the page that will be redirected to the main rd_title
page)
I have the query to retrieve all redirect entries of a main topic:
SELECT
page_id,
page_title,
rd_from,
rd_title
FROM
redirect
JOIN page ON page_id = rd_from
WHERE rd_title = 'Dog_breeding'
But two problems:
- it is not giving me the main page itself ('Dog_breeding')
- if I put
WHERE rd_title = 'Dog_breeder'
it finds nothing…
I would like to get something like the below regardless of the input being Dog_breeding
or Dog_breeder
:
page_id | page_title | rd_from | rd_title
-------------------------------------------------------
476072 | Dog_breeding | |
796236 | Dog_breeder | 796236 | Dog_breeding
16274701 | Dog_breeders | 16274701 | Dog_breeding
37353772 | Stud_dog | 37353772 | Dog_breeding
40738352 | Dog_Breeder | 40738352 | Dog_breeding
I guess I could do it with PHP and several back and forth with the Database (check if it's a main topic, if not grab the main topic, then query like I did above) but I feel like there is a one-query solution?
Best Answer
Initially it seems pretty easy to solve the question, by adding a new criteria to the WHERE clause.
But as Lazhar has pointed out, this only works if the requested text belongs to the main wiki page, not for the redirected pages.
After taking a look at table definitions:
MediaWiki - Manual:page table
MediaWiki - Manual:redirect table
I've decided to use this field to solve the problem.
First we need to know the title of the main page, the page that is not a redirection. In this question:
Dog_breeding
Then using the first approach:
This is the result:
Can check it here: http://rextester.com/OQMV97513
NOTE: There is another solution using
pagelinks
table in SO