Snippets

You can't parse [X]HTML with regex

If you enjoy Lovecraftian horror mixed hilariously with nerdy advice, this might be your thing:

You can’t parse [X]HTML with regex. Because HTML can’t be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the nerves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the trangession of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of regex parsers for HTML will instantly transport a programmer’s consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection will devour your HTML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fight he com̡e̶s, ̕h̵is un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸ ̛l̕ik͏e liquid pain, the song of re̸gular expression parsing will extinguish the voices of mortal man from the sphere I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful the final snuffing of the lies of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ

Have you tried using an XML parser instead?

How to determine which, if any CSS rules are unused on a site

So this is a really interesting way to determine which, if any CSS rules are unused in a stylesheet, site-wide:

Part of this story could certainly be about deleting CSS that is determined to be “unused” in a project. I know there is incredible demand for this kind of tooling. I feel like there are some developers damn near frothing at the mouth to blast their CSS through some kind of fancy tool to strip away anything unneeded.

[…]

Here’s how one company I heard from was doing it:

  1. They injected a script onto the page for some subset of users.
  2. The script would look at the CSSOM and find every single selector in the CSS for that page.
  3. It would also run a querySelectorAll(“*”) and find every single DOM node on that page.
  4. It would compare those two sets and find all selectors that seemed to be unused.
  5. In order to get the best results, it would fire this script after a random amount of seconds, on a random set of users, in a random set of conditions. Even with this, it needed a lot of data over a long period of time.
  6. After that had run for long enough, there was a set of CSS selectors that seemed likely to be unused.
  7. To be sure, unique background images were applied to all those selectors.
  8. After applying those and waiting for another length of time, the server logs were checked to make sure those images were never accessed. If they were, that selector was used, and would have to stay.
    Ultimately, the unused selectors could safely be deleted from the CSS.

Whew! That’s an awful lot of work to remove some CSS.

But as you can imagine, it’s fairly safe. Imagine just checking one page’s CSS coverage. You’ll definitely find a bunch of unused CSS. One page, in one specific state, is not representative of your entire website.

The IndieWeb Movement Will Help People Control Their Own Web Presence?

This seems like a really good introduction to and rationale for the IndieWeb:

The early vision of the web was one of a decentralized and somewhat anarchic community where we each had control over our own content and our own online presence — that’s a vision that Tim Berners-Lee still endorses, but it’s one that’s put in jeopardy by the relentless centralizing tendency of big companies. And that’s why I find the Indie Web movement so interesting — not as a rejection of the corporate influence, but as a much needed counterbalance that provides the technology for people, should they so choose, to build an online presence of their own devising without giving up the communities and the connections that they have built on existing networks.

The Indie Web is the name given to a movement instigated by a group of technologists who want to put some distance between themselves and Silicon Valley. At the heart of the Indie Web are the IndieWebCamps, but I’d like to have a quick look at one of the central ideas motivating the creation of various technologies that help foster the same connectivity as social networks with a bit more freedom.

I have nothing to hide

I know you are not a terrorist but still you give privacy for understood in many aspects of your daily physical life. You expect it every time you go to the bathroom and close the door, you expect it when you go to the doctor or you confide to a close friend.

[…]

Everyone has things that keep to themselves, things that only say to a special person, things that can be shared with close friends or family, things they talk only to a doctor and so on and so forth.

If in real life we have so many layers, why don’t we expect the same level of privacy on internet? We are giving to Facebook, Google, Linkedin and the others more information about ourselves than we give to our SO or even to ourselves (remember Google is investing in DNA mapping).

Browse Against the Machine

[T]he web looks more and more like a feudal system, where the geography of the web has been partitioned off by the Frightful Five. Google, Facebook, Microsoft, Apple and Amazon are our lord and protectors, exacting a royal sum for our online behaviors. We’re the serfs and tenants, providing homage inside their walled fortresses. Noble upstarts are erased or subsumed under their existing order.

A Love Letter to CSS

When I tell coworkers of my unabated love for CSS they look at me like I’ve made an unfortunate life decision.

[…]

Sometimes I feel that developers, some of the most opinionated human beings on the planet, can only agree on one thing: that CSS is totally the worst.

[…]

But today I’m going to blow your mind. Today I’m going to try to convince you that not only is CSS one of the best technologies you use on a day-to-day basis, not only is CSS incredibly well designed, but that you should be thankful—thankful!—each and every time you open a .css file.

My argument is relatively simple: creating a comprehensive styling mechanism for building complex user interfaces is startlingly hard, and every alternative to CSS is much worse. Like, it’s not even close.

Responsive Design for Motion, a.k.a. "prefers-reduced-motion" Media Query

WebKit now supports the prefers-reduced-motion media feature, part of CSS Media Queries Level 5, User Preferences. The feature can be used in a CSS @media block or through the window.matchMedia() interface in JavaScript. Web designers and developers can use this feature to serve alternate animations that avoid motion sickness triggers experienced by some site visitors.

To explain who this media feature is for, and how it’s intended to work, we’ll cover some background. Skip directly to the code samples or prefers-reduced-motion demo if you wish.

Inclusive Toggle Buttons

How you design and implement your toggle buttons is quite up to you, but I hope you’ll remember this post when it comes to adding this particular component to your pattern library. There’s no reason why toggle buttons — or any interface component for that matter — should marginalize the number of people they often do.

You can take the basics explored here and add all sorts of design nuances, including animation. It’s just important to lay a solid foundation first.

Checklist

  • Use form elements such as checkboxes for on/off toggles if you are certain the user won’t believe they are for submitting data.
  • Use <button> elements, not links, with aria-pressed or role="switch" plus aria-checked.
  • Don’t change label and state together.
  • When using visual “on” and “off” text labels (or similar) you can override these with a unique label via aria-labelledby.
  • Be careful to make sure the contrast level between the button’s text and background color meets WCAG 2.0 requirements.

See the source link for very detailed descriptions of all of these points.