HttpEntryPoint.class.php File Reference
Go to the source code of this file.
Detailed Description
- Developer log entry:
- Necdet Can Atesman (2009-11-17): Page handling is quite lame currently. The issues are:
- we're currently re-parsing all page files upon each request in development mode, there is really no need for that. Furthermore,
- the decision algorithm for determining the page file responsible for a request is horrible.
So the steps we need to take are the following:
- Caching: Should be obvious. Now that we have a nice function for recursively retrieving files with given suffixes, along with the most recently modified file's time stamp, we can cache very effectively.
- Collision checking: We're checking for url collisions on each request (i.e. the url
/contact can be served by user.page and contact.page.) This can be reduced to a single, but complicated check beforing caching the results. - Page determination: We're wasting too much time on deciding which URL maps to which page file. By demanding regular expressions from a page file, we could speed this process up a lot.
But before implementing this, we should re-design URL template syntax. This is a list of dynamic values, that can be present in URLs:
- Basic types: int, string, real, array<string>, etc. The regular expression for these can be easily generated. It would be really interesting if we could tie these values to members directly, so we can do even more checks (This string value is a username, implying that it must not be longer than
X characters.) - User types: We already had this. Can either be the id of an object or another property. Example:
/user/14 or /user/soulmerge. If the values available here are really dynamic, it might lead to serious problems with collisions. Eventually a clever attacker will find a way to work around pre-defined urls or innocent users will not have a personal page because another, static page is overriding his user name (like contact.)
The issue here is that we can't check dynamic values for collisions too well. Either we forbid defining URL templates that have the slightest chance of collision - unfortunately, this is not possible: we really need this feature - or we make checks throughout the application. The biggest problem is that the developer needs to take these collisions into account in the code. He mustn't allow users with the name contact to be generated, and every project needs to define such constraints on certain members (username mustn't be contact or details or about, etc.) This could even be somehow manageable, the real issue is that it should be possible to define conflicting URL templates with different user types. A CMS, for example, might define its "static" pages dynamically, while allowing access to other dynamic pages (like user profiles.)
For the CMS, we generally need a new type of pattern, the catch-all token that accepts any string, including slashes - the manager of the CMS should be able to define his URLs as he wishes. This, again, leads to difficulties: This catch-all url would need to have least priority, since everything else needs to be processed before that. On the other hand, the "static" pages created by the manager should have higher priority than user pages, for example. This could be solved by assigning a high priority to the catch-all URL and deciding if we can server the request by querying the database first. This would mean, we'd have to do additional checks beside the regular expression to decide which entry point will serve a request. Other than that, the regular expression might be dynamically generated and cached to skip the additional checks (i.e. as ^(about|details|contact).) This would be ok for a small amount of pages, but making that check for 20k users might be a bit difficult. Some immediately performed tests show that using regular expressions is pointless, as the overhead of querying the (indexed) database column is still much less than compiling and axecuting PCRE patterns containing the database values. So the idea of controlling pages with regular expressions only looks like a dead end.
The only remaining problem is an esthetic one: How do we define which page file overrides which others?
- We could require them to provide a priority value.
- A page file could state which other page files it overrides or deferres to.
- An external resource (configuration file) could contain the relevance of all entry points.
- Developer log entry:
- Necdet Can Atesman (2009-11-18): Having looked at the routing of other frameworks (which basically use regular expressions and/or map class and function names), I decided that this issue can be solved without developer interaction to great parts. A set of rules can be applied safely:
- All static route names precede dynamic ones, i.e.
/about matches before /$username. - String-variables match slashes, too. In order for this to work, URLs with multiple string values must precede those with less values. So,
/objects/$tag/$page must be evaluated before /objects/$tag. - In case of collisions, we can assume that the id always precedes other members, if this solves the conflict.
/user/14 should always match the user with id 14, not the one with that name. - To be able to express multiple values in the same URL, we can make use of the already-defined property paths. A slug would then become
/blog/$post.slug/$post.id, whereas independant variables would have different names like /images/tagged/$tag1/$tag2. - If a member in the URL is not marked as
@unique, the variable must be an array. /images/named/$image.name will expect $image to be an array of images if the member $name is not unique.
Anything that still is ambiguous after these rules (like /user/$user.username and /user/$user.lastname) must be solved manually. If the conflict is within a single file, it can be assumed that rules defined first override later ones.
Definition in file HttpEntryPoint.class.php.