I've always been bothered by any redundancy or duplication in code. I wrote about it many years
ago. Looking at this code just makes me suffer:
interface ContainerAwareInterface
{
/**
* Sets the container.
*/
public function setContainer(ContainerInterface $container = null);
}
Let's set aside the unnecessary commentary on the method for now. And this
time also the misunderstanding of dependency injection, if a library needs such
an interface. The fact that using the word Interface
in the name of
an interface is, in turn, a sign of not understanding object-oriented
programming, I'm planning a separate article on that. After all, I've been there
myself.
But why on earth specify the visibility as public
? It's a pleonasm. If it wasn't public,
then it wouldn't be an interface, right? And then someone thought to make it a
“standard” ?♂️
Sorry for the long introduction, what I'm getting to is whether to write
optional nullable types with or without a question mark. So:
// without
function setContainer(ContainerInterface $container = null);
// with
function setContainer(?ContainerInterface $container = null);
Personally, I have always leaned towards the first option, because the
information given by the question mark is redundant (yes, both notations mean
the same from the language's perspective). This is how all the code was written
until the arrival of PHP 7.1, the version that added the question mark, and
there would have to be a good reason to change it suddenly.
With the arrival of PHP 8.0, I changed my mind and I'll explain why. The
question mark is not optional in the case of properties. PHP will throw an error
in this case:
class Foo
{
private Bar $foo = null;
}
// Fatal error: Default value for property of type Bar may not be null.
// Use the nullable type ?Bar to allow null default value
And from PHP 8.0 you can use promoted
properties, which allows you to write code like this:
class Foo
{
public function __construct(
private ?Bar $foo = null,
string $name = null,
) {
// ...
}
}
Here you can see the inconsistency. If ?Bar
is used (which is
necessary), then ?string
should follow on the next line. And if
I use the question mark in some cases, I should use it in all cases.
The question remains whether it is better to use a union type
string|null
instead of a question mark. For example, if I wanted
to write Stringable|string|null
, maybe the version with a question
mark isn't at all necessary.
Update: It looks like PHP 8.4 will require the notation with a
question mark.
PHP 8.1 introduces an interesting feature: readonly member
variables:
Let's start with an example of how to use it:
class Test
{
public readonly string $prop;
public function setProp(string $prop): void
{
$this->prop = $prop; // legal initialization
}
}
$test = new Test;
$test->setProp('abc');
echo $test->prop; // legal read
$test->prop = 'foo'; // throws exception: Cannot modify readonly property Test::$prop
Once initialized, a variable cannot be overwritten with another value.
Scope
Interestingly, attempting to assign a value to $test->prop
will also throw an exception even if the variable hasn't been initialized:
$test = new Test;
$test->prop = 'foo';
// throws exception too: Cannot initialize readonly property Test::$prop from global scope
This will even throw an exception:
class Child extends Test
{
public function __construct()
{
$this->prop = 'hello';
// throws exception: Cannot initialize readonly property Test::$prop from scope Child
}
}
A readonly variable simply cannot be written from anywhere other than the
class that defined it. Quite peculiar.
Immutability
The fact that the content of readonly variables cannot be changed doesn't
mean the data written to them is immutable. If an object is written to such a
variable, its internal variables can still be modified. The object does not
become immutable.
The same applies to arrays. Although the behavior is slightly different here.
Changing elements in the array is considered a change to the entire array and as
such is impermissible in a readonly variable. However, if the array contains an
element that is a reference, changing its content is not considered a change to
the entire array and thus can occur in a readonly element. This, however, is
standard PHP behavior as always.
In other words, this is possible:
class Test
{
public readonly array $prop;
public function test(): void
{
$item = 'foo';
$this->prop = [1, &$item, 2];
dump($this->prop); // [1, 'foo', 2]
$item = 'bar'; // legal
dump($this->prop); // [1, 'bar', 2]
}
}
But this is not possible:
class Test
{
public readonly array $prop;
public function test(): void
{
$this->prop = ['a', 'b'];
$this->prop[1] = 'c'; // throws exception!
}
}
Type
Since readonly variables utilize the ‘uninitialized’ state, which exists
for variables with a defined type, it is only possible to declare a variable as
readonly in conjunction with a data type.
I was curious about which PHP framework has the best
documentation and how Nette ranks among them. But how can you find out?
We all know that the worst scenario is having no documentation at all,
followed by inadequate documentation. The opposite is extensive documentation.
It seems, therefore, that the sheer volume of documentation is an important
indicator. Of course, its understandability and currency, as well as readability
and accuracy, play a huge role. These factors are very difficult to measure.
However, I know from my own experience how many sections of
Nette's documentation I have rewritten multiple times to make them clearer,
and how many corrections I have merged, and I assume this happens with any
long-standing framework. Thus, it appears that all documentation gradually
converges towards a similar high quality. Therefore, I allow myself to take the
sheer volume of data as a guide, though it is an oversimplification.
Of course, the volume of documentation must be proportional to the size of
the library itself. Some are significantly larger than others and should
accordingly have significantly more documentation. For simplicity, I determine
the size of the library by the volume of PHP code, normalized for white space
and excluding comments.
I created a chart showing the ratio of English documentation to code for
well-known frameworks CakePHP (4.2), CodeIgniter (3.1), Laravel (8.62), Nette
(3.1), Symfony (5.4), YII (2.0), and Zend Framework (2.x, no longer in
development):
As you can see from the chart, the extent of documentation relative to the
code is more or less similar across all frameworks.
CodeIgniter stands out. I tip my hat to CakePHP and YII, which strive to
maintain documentation in a range of other languages. The comprehensiveness of
Nette's documentation is above average. Additionally, Nette is the only
framework that has a 1:1 translation in our native language.
The purpose of the chart is NOT to show that one framework has so many
percent more comprehensive documentation than another. The metric is too
primitive for that. Instead, the purpose is to show that the extent of
documentation among the various frameworks is largely comparable. I created it
mainly for myself, to get an idea of how Nette's documentation compares to its
competitors.
Originally published in August 2019, data updated for
October 2021.
Request termination in PHP consists of the following steps performed in the
specified order:
- Calling all functions registered using
register_shutdown_function()
- Calling all
__destruct()
methods
- Flushing all output buffers
- Terminating all PHP extensions (e.g., sessions)
- Shutting down the output layer (sending HTTP headers, clearing output
handlers, etc.)
We will focus in more detail on step 2, i.e., the calling of destructors. Of
course, object destruction may occur in the first step, i.e., during the calling
of registered shutdown functions, for example, if one of the functions held the
last reference to an object or if the shutdown function itself was an
object.
The calling of destructors occurs as follows:
- PHP first attempts to unset objects in the global symbol table.
- Then it calls the destructors of all remaining objects.
- If execution is stopped for example due to
exit()
, the
remaining destructors are not called.
ad 1) PHP goes through the global symbol table in reverse, i.e., it starts
from the variable that was created last and proceeds to the variable that was
created first. During this traversal, it unsets all objects with refcount=1.
This iteration is performed as long as such objects exist.
Basically, what is done is that a) all unused objects in the global symbol
table are removed b) if new unused objects appear, they are also removed c) and
so forth. This method of destruction is used so that objects can depend on other
objects in the destructor. Usually, this works well if the objects in the global
scope do not have complicated (e.g., circular) interdependencies.
The destruction of the global symbol table significantly differs from the
destruction of other symbol tables. Therefore, PHP uses a smarter algorithm for
the global symbol table that tries to respect the dependencies of objects.
ad 2) Other objects are traversed in the order they were created and their
destructor is called. Yes, PHP only calls __destruct
, but it
actually does not unset the object (nor even changes its refcount). Therefore,
if other objects still reference it, it will still be available (even though the
destructor has already been called). In a sense, they will be using some sort of
“half-destroyed” object.
ad 3) In the case where execution is stopped during the calling of
destructors, e.g., due to exit()
, the remaining destructors are not
called. Instead, PHP marks the objects as already destroyed. An important
consequence is that the calling of destructors is not certain. Instances where
this happens are rather rare, but it can occur.
Source https://stackoverflow.com/…ucted-in-php
When writing your own error handler for PHP, it is absolutely
necessary to follow several rules. Otherwise, it can disrupt the behavior of
other libraries and applications that do not expect treachery in the error
handler.
Parameters
The signature of the handler looks like this:
function errorHandler(
int $severity,
string $message,
string $file,
int $line,
array $context = null // only in PHP < 8
): ?bool {
...
}
The $severity
parameter contains the error level
(E_NOTICE
, E_WARNING
, …). Fatal errors such as
E_ERROR
cannot be caught by the handler, so this parameter will
never have these values. Fortunately, fatal errors have essentially disappeared
from PHP and have been replaced by exceptions.
The $message
parameter is the error message. If the html_errors
directive is enabled, special characters like <
are written as
HTML entities, so you need to decode
them back to plain text. However, beware, some characters are not written
as entities, which is a bug. Displaying errors in pure PHP is thus prone to XSS.
The $file
and $line
parameters represent the name
of the file and the line where the error occurred. If the error occurred inside
eval()
, $file
will be supplemented with this information.
Finally, the $context
parameter contains an array of local
variables, which is useful for debugging, but this has been removed in PHP
8. If the handler is to work in PHP 8, omit this parameter or give it a
default value.
Return Value
The return value of the handler can be null
or
false
. If the handler returns null
, nothing happens.
If it returns false
, the standard PHP handler is also called.
Depending on the PHP configuration, this can print or log the error.
Importantly, it also fills in internal information about the last error, which
is accessible by the error_get_last()
function.
Suppressed Errors
In PHP, error display can be suppressed either using the shut-up operator
@
or by error_reporting()
:
// suppress E_USER_DEPRECATED level errors
error_reporting(~E_USER_DEPRECATED);
// suppress all errors when calling fopen()
$file = @fopen($name, 'r');
Even when errors are suppressed, the handler is still called.
Therefore, it is first necessary to verify whether the error is
suppressed, and if so, we must end our own handler:
if (!($severity & error_reporting())) {
return false;
}
However, in this case, we must end it with return false
,
so that the standard error handler is still executed. It will not print or log
anything (because the error is suppressed), but ensures that the error can be
detected using error_get_last()
.
Other Errors
If our handler processes the error (for example, displays its own message,
etc.), there is no need to call the standard handler. Although then it will not
be possible to detect the error using error_get_last()
, this does
not matter in practice, as this function is mainly used in combination with the
shut-up operator.
If, on the other hand, the handler does not process the error for any reason,
it should return false
so as not to conceal it.
Example
Here's what the code for a custom error handler that transforms errors into
ErrorException
exceptions might look like:
set_error_handler(function (int $severity, string $message, string $file, int $line) {
if (!(error_reporting() & $severity)) {
return false;
}
throw new \ErrorException($message, 0, $severity, $file, $line);
});
SameSite cookies provide a mechanism to recognize what led to
the loading of a page. Whether it was through clicking a link on another
website, submitting a form, loading inside an iframe, using
JavaScript, etc.
Identifying how a page was loaded is crucial for security. The serious
vulnerability known as Cross-Site
Request Forgery (CSRF) has been with us for over twenty years, and SameSite
cookies offer a systematic way to address it.
A CSRF attack involves an attacker luring a victim to a webpage that
inconspicuously makes a request to a web application where the victim is logged
in, and the application believes the request was made voluntarily by the victim.
Thus, under the identity of the victim, some action is performed without the
victim knowing. This could involve changing or deleting data, sending a message,
etc. To prevent such attacks, applications need to distinguish whether the
request came from a legitimate source, e.g., by submitting a form on the
application itself, or from elsewhere. SameSite cookies can do this.
How does it work? Let’s say I have a website running on a domain, and
I create three different cookies with attributes SameSite=Lax
,
SameSite=Strict
, and SameSite=None
. Name and value do
not matter. The browser will store them.
- When I open any URL on my website by typing directly into the address bar
or clicking on a bookmark, the browser sends all three cookies.
- When I access any URL on my website from a page from the same
website, the browser sends all three cookies.
- When I access any URL on my website from a page from a different
website, the browser sends only the cookies with
None
and in
certain cases Lax
, see table:
Code on another website |
|
Sent cookies |
Link |
<a href="…"> |
None + Lax |
Form GET |
<form method="GET" action="…"> |
None + Lax |
Form POST |
<form method="POST" action="…"> |
None |
iframe |
<iframe src="…"> |
None |
AJAX |
$.get('…'), fetch('…') |
None |
Image |
<img src="…"> |
None |
Prefetch |
<link rel="prefetch" href="…"> |
None |
… |
|
None |
SameSite cookies can distinguish only a few cases, but these are crucial for
protecting against CSRF.
If, for example, there is a form or a link for deleting an item on my
website's admin page and it was sent/clicked, the absence of a cookie created
with the Strict
attribute means it did not happen on my website but
rather the request came from elsewhere, indicating a CSRF attack.
Create a cookie to detect a CSRF attack as a so-called session cookie without
the Expires
attribute, its validity is essentially infinite.
Domain vs Site
“On my website” is not the same as “on my domain,” it's not about
the domain, but about the website (hence the name SameSite). Although the site
often corresponds to the domain, for services like github.io
, it
corresponds to the subdomain. A request from doc.nette.org
to
files.nette.org
is same-site, while a request from
nette.github.io
to tracy.github.io
is already
cross-site. Here it is nicely
explained.
<iframe>
From the previous lines, it is clear that if a page from my website is loaded
inside an <iframe>
on another website, the browser does not
send Strict
or Lax
cookies. But there's another
important thing: if such a loaded page creates Strict
or
Lax
cookies, the browser ignores them.
This creates a possibility to defend against fraudulent acquisition of
cookies or Cookie
Stuffing, where until now, systemic defense was also lacking. The trick is
that the fraudster collects a commission for affiliate marketing, although the
user was not brought to the merchant's website by a user-clicked link. Instead,
an invisible <iframe>
with the same link is inserted into the
page, marking all visitors.
Cookies without the SameSite
Attribute
Cookies without the SameSite attribute were always sent during both same-site
and cross-site requests. Just like SameSite=None
. However, in the
near future, browsers will start treating the SameSite=Lax
flag as
the default, so cookies without an attribute will be considered
Lax
. This is quite an unusually large BC break in browser behavior.
If you want the cookie to continue to behave the same and be transmitted during
any cross-site request, you need to set it to SameSite=None
.
(Unless you develop embedded widgets, etc., you probably won't want this often.)
Unfortunately, for last year's browsers, the None
value is
unexpected. Safari 12 interprets it as Strict
, thus creating a
tricky problem on older iOS and macOS.
And note: None
works only when set with the Secure
attribute.
What to Do in Case of an
Attack?
Run away! The basic rule of self-defense, both in real life and on the web.
A huge mistake made by many frameworks is that upon detecting a CSRF attack,
they display the form again and write something like “The CSRF token is
invalid. Please try to submit the form again”. By resubmitting the form,
the attack is completed. Such protection lacks sense when you actually invite
the user to bypass it.
Until recently, Chrome did that during a cross-site request—it displayed
the page again after a refresh, but this time sent the cookies with the
Strict
attribute. So, the refresh eliminated the CSRF protection
based on SameSite cookies. Fortunately, it no longer does this today, but
it's possible that other or older browsers still do. A user can also
“refresh” the page by clicking on the address bar + enter, which is
considered a direct URL entry (point 1), and all cookies are sent.
Thus, the best response to detecting CSRF is to redirect with a 302 HTTP
code elsewhere, perhaps to the homepage. This rids you of dangerous POST data,
and the problematic URL isn't saved to history.
Incompatibilities
SameSite hasn't worked nearly as well as it should have for a long time,
mainly due to browser bugs and deficiencies in the specification, which, for
example, didn't address redirections or refreshes. SameSite cookies weren't
transferred during saving or printing a page, but were transferred after a
refresh when they shouldn't have been, etc. Fortunately, the situation is better
today. I believe that the only serious shortcomings in current browser versions
persist, as mentioned above for Safari.
Addendum: Besides SameSite, the origin of a request can very recently be
distinguished also by the Origin
header, which is more privacy-respecting and more accurate than the Referer
header.
Content Security Policy (CSP) is an additional security feature that tells
the browser what external sources a page can load and how it can be displayed.
It protects against the injection of malicious code and attacks such as XSS. It
is sent as a header composed of a series of
directives. However, implementing it is not trivial.
Typically, we want to use JavaScript libraries located outside our server,
such as Google Analytics, advertising systems, captchas, etc. Unfortunately, the
first version of CSP fails here. It requires a precise analysis of the content
loaded and the setting of the correct rules. This means creating a whitelist, a
list of all the domains, which is not easy since some scripts dynamically pull
other scripts from different domains or are redirected to other domains, etc.
Even if you take the effort and manually create the list, you never know what
might change in the future, so you must constantly monitor if the list is still
up-to-date and correct it. Analysis by Google showed that even this meticulous
tuning ultimately results in allowing such broad access that the whole purpose
of CSP falls apart, just sending much larger headers with each request.
CSP level 2 approaches the problem differently using a nonce, but only the
third version of the solution completed the process. Unfortunately, as of 2019,
it does not have sufficient browser support.
Regarding how to assemble the script-src
and
style-src
directives to work correctly even in older browsers and
to minimize the effort, I have written a detailed
article in the Nette partner section. Essentially, the resulting form might
look like this:
script-src 'nonce-XXXXX' 'strict-dynamic' * 'unsafe-inline'
style-src 'nonce-XXXXX' * 'unsafe-inline'
Example of Use in PHP
We generate a nonce and send the header:
$nonce = base64_encode(random_bytes(16));
header("Content-Security-Policy: script-src 'nonce-$nonce' 'strict-dynamic' * 'unsafe-inline'");
And we insert the nonce into the HTML code:
<script nonce="<?=$nonce?>" src="..."></script>
Example of Use in Nette
Since Nette has built-in support for CSP and nonce since version 2.4, simply
specify in the configuration
file:
http:
csp:
script-src: [nonce, strict-dynamic, *, unsafe-inline]
style-src: [nonce, *, unsafe-inline]
And then use in templates:
<script n:nonce src="..."></script>
<style n:nonce>...</style>
Monitoring
Before you set new rules for CSP, try them out first using the
Content-Security-Policy-Report-Only
header. This header works in
all browsers that support CSP. If a rule is violated, the browser does not block
the script but instead sends a notification to the URL specified in the
report-uri
directive. To receive and analyze these notifications,
you might use a service like Report
URI.
http:
cspReportOnly:
script-src: [nonce, strict-dynamic, *, unsafe-inline]
report-uri: https://xxx.report-uri.com/r/d/csp/reportOnly
You can use both headers simultaneously, with
Content-Security-Policy
having verified and active rules and
Content-Security-Policy-Report-Only
to test their modifications. Of
course, you can also monitor failures in the strict rules.
It's a bit like when you spot a poster for a concert by a band you remember
from your youth. Are they still playing? Or did they get back together after
years because they need the money? Perhaps to cash in on the strings of
nostalgia? [perex]
Texy is my first open-source project. I started writing it fifteen years ago. Texy
has survived several version control systems. Numerous web services hosting
repositories. Several string encodings. Various markup languages for creating
websites. Several of my life relationships. A number of cities I've
lived in.
Texy is still here because there is nothing
better.
So, I have kept it up-to-date for fifteen years. We started in PHP 4, which
was the worst programming language in the world and thus a challenge, then moved
on to PHP 5 with relief, a few years later we transitioned to namespaces
(Texy::Parser
instead of TexyParser
, wow), watched PHP
stop being the worst language in the world, which frustrated many programmers
who then turned to JavaScript, then God created PHP 7 and with it type hints
(Texy::process(string $text): string
megawow), and strictness came
into fashion with declare(strict_types=1)
and we honor that.
And so here is Texy 3.0.. It's the
same as the previous versions, but with all the bells and whistles of PHP
7.1. It's the same because you don't mess with perfection.
Texy was here when you were born, in programming terms. Someday, Texy might
even format your epitaph. And it will insert a non-breaking space between
a
and room
.
How to mock classes that are defined as final or some of their
methods are final?
Mocking means replacing the original object with its testing imitation that
does not perform any functionality and just looks like the original object. And
pretending the behavior we need to test.
For example, instead of a PDO with methods like query() etc., we create a
mock that pretends working with the database, and instead verifies that the
correct SQL statements are called, etc. More e.g. in the Mockery
documentation.
And in order to be able to pass mock to methods that use PDO
type hint, it is necessary for the mock class to inherit from the PDO. And that
can be a stumbling block. If the PDO or method query() were final, it would not
be possible.
Is there any solution? The first option is not to use the final keyword at
all. This, of course, does not help with the third-party code that it uses, but
mainly detracts from the important element of the object design. For example,
there is dogma that every class should be either final or abstract.
The second and very handy option is to use BypassFinals, which removes
finals from source code on-the-fly and allows mocking of final methods and
classes.
Install it using Composer:
composer require dg/bypass-finals --dev
And just call at the beginning of the test:
require __DIR__ . '/vendor/autoload.php';
DG\BypassFinals::enable();
Thats all. Incredibly black magic ?
BypassFinals requires PHP version 5.6 and supports PHP up to 7.2. It can be
used together with any test tool such as PHPUnit or Mockery.
This functionality is directly implemented in the “Nette Tester”: https://tester.nette.org version 2.0 and
can be enabled this way:
require __DIR__ . '/vendor/autoload.php';
Tester\Environment::bypassFinals();
A naming conundrum: how to collectively refer to classes and interfaces? For
instance, what should you call a variable that could contain either a class or
an interface name? What should be used instead of $class
?
One might consider the term type ($type
), but this is
quite generic because a type can also be a string or an array. From the
perspective of the language, a type could be something more complex, such as
?array
. Moreover, it's debatable what constitutes the type of an
object: is it the class name, or is it object
?
However, there indeed exists a collective term for classes and interfaces: it
is the word class.
How so?
- From a declaration standpoint, an interface is essentially a stripped-down
class. It can only contain public abstract methods, which also implies that
objects cannot be created. Therefore, interfaces are a subset of classes. If
something is a subset, we can refer to it by the name of the superset. Just as a
human is a mammal, an interface is a class.
- Nevertheless, there's also the usage perspective. A class can inherit from
only one class but can implement multiple interfaces. However, this limitation
pertains to classes, not to the interfaces themselves. Similarly, a class cannot
inherit from a final class, but we still perceive the final class as a class.
Also, if a class can implement multiple interfaces (i.e., classes, see 1.), we
still regard them as classes.
And what about traits? They simply do not belong here, as they do not exist
from an OOP standpoint.
Thus, the issue of naming classes and interfaces together is resolved.
Let’s simply call them classes.
classes + interfaces = classes
Well, but a new problem has arisen. How to refer to classes that are not
interfaces? That is, their complement. What was referred to at the beginning of
the article as classes. Non-interface? Or “implementations”#Class_vs._type)? ?
That's an even bigger nut to crack. It’s a tough nut indeed. You know
what, let's forget that interfaces are also classes and again pretend that
every OOP identifier is either a class or an interface. It will be easier.