As long as we’re filling out our fantasy browser brackets, I’m hoping that the Servo engine and browser/s can become viable. Servo was started at Mozilla as a web rendering engine only, before they laid off the whole team and the Linux Foundation took over the project. Basically revived from the dead in 2023, the current project is working on an engine and a demonstration browser that uses it. It’s years away from being a usable replacement for current browsers and the engine is certainly the main project. A separate browser which employs Servo as its engine is a more likely future than an actual Servo browser.
Still, you can download a demo build of the official browser from the web site. Currently, it’s only usable for very simple web sites. Even Lemmy/Mbin display is a little broken, and I think of those as fairly basic. YouTube is out of the question. One of the sites that’s been used to demonstrate its capability to render web pages is the web site for Space Jam (1996) if that gives you any idea of its current state.
Honest question, since I have no clue about web/browser engines other than being able to maybe name 4-5 of them (Ladybird, Servo, Webkit, Gecko, … shit, what was Chromium’s called again?):
What makes browsers/browser engines so difficult that they need millions upon millions of LOC?
Naively thinking, it’s “just” XML + CSS + JS, right? (Edit: and then the networking stack/hyperlinks)
So what am I missing? (Since I’m obviously either forgetting something and/or underestimating how difficult engines for the aforementioned three are to build…)
What makes implementation so difficult is that browsers cannot just “work”, they need to be correct in what they do. And support all websites.
The standards of HTML, CSS and JS have developed over a long time, not only is the amount of stuff massive, over time sometimes strange features where implemented, that were then used by website developers, and now these all need to be handled correctly by all new browsers.
Emulating and reimplementing existing stuff is often more difficult, especially if you cannot leave out any feature, no matter how obscure, because that might break someone’s website.
JavaScript alone is not a simple beast. It needs to be optimized to deal with modern JavaScript web apps so it needs JIT, it also needs sandboxing, and all of the standard web APIs it has to implement. All of this also needs to be robust. Browsers ingest the majority of what people see on the Internet and they have to handle every single edge case gracefully. Robust software is actually incredibly difficult and good error handling often adds a lot more code complexity. Security in a browser is also not easy, you’re parsing a bunch of different untrusted HTML, CSS, and JavaScript. You’re also executing untrusted code.
Then there is the monster that is CSS and layout. I can’t imagine being the people that have to write code dealing with that it’d drive me crazy.
Then there are all of the image formats, HTML5 canvases, videos, PDFs, etc. These all have to be parsed safely and displayed correctly as well.
There is also the entire HTTP spec that I didn’t even think to bring up. Yikes is that a monster too, you have to support all versions. Then there is all of that networking state and TLS + PKI.
There is likely so much that I’m still leaving out, like how all of this will also be cross platform and sometimes even cross architecture.
Adding on to this, while this article is fast approaching 20 years old, it gets into the quagmire that is web standards and how ~10 (now ~30) years of untrained amateurs (and/or professionals) doing their own interpretations of what the web standards mean–plus another decade or so before that in which there were no standards–has led to a situation of browsers needing to gracefully handle millions of contradictory instructions coming from different authors’ web sites.
Thanks for these explanations, that makes a lot more sense now. I didn’t even think to consider browsers might be using something else than an off-the-shelf implementation for image/other file formats…, lol
Sorry I didn’t mean to imply they don’t use shared libs, they definitely do, but they have to integrate them into the larger system still and put consistent interfaces over them.
Yeah I realize that. My go-to comparison would be PDF. Where Firefox has PDF.js (I think?), Chromium just… implements basically seemingly the entire (exhaustive!) standard.
As long as we’re filling out our fantasy browser brackets, I’m hoping that the Servo engine and browser/s can become viable. Servo was started at Mozilla as a web rendering engine only, before they laid off the whole team and the Linux Foundation took over the project. Basically revived from the dead in 2023, the current project is working on an engine and a demonstration browser that uses it. It’s years away from being a usable replacement for current browsers and the engine is certainly the main project. A separate browser which employs Servo as its engine is a more likely future than an actual Servo browser.
Still, you can download a demo build of the official browser from the web site. Currently, it’s only usable for very simple web sites. Even Lemmy/Mbin display is a little broken, and I think of those as fairly basic. YouTube is out of the question. One of the sites that’s been used to demonstrate its capability to render web pages is the web site for Space Jam (1996) if that gives you any idea of its current state.
Honest question, since I have no clue about web/browser engines other than being able to maybe name 4-5 of them (Ladybird, Servo, Webkit, Gecko, … shit, what was Chromium’s called again?):
What makes browsers/browser engines so difficult that they need millions upon millions of LOC?
Naively thinking, it’s “just” XML + CSS + JS, right? (Edit: and then the networking stack/hyperlinks)
So what am I missing? (Since I’m obviously either forgetting something and/or underestimating how difficult engines for the aforementioned three are to build…)
What makes implementation so difficult is that browsers cannot just “work”, they need to be correct in what they do. And support all websites.
The standards of HTML, CSS and JS have developed over a long time, not only is the amount of stuff massive, over time sometimes strange features where implemented, that were then used by website developers, and now these all need to be handled correctly by all new browsers.
Emulating and reimplementing existing stuff is often more difficult, especially if you cannot leave out any feature, no matter how obscure, because that might break someone’s website.
JavaScript alone is not a simple beast. It needs to be optimized to deal with modern JavaScript web apps so it needs JIT, it also needs sandboxing, and all of the standard web APIs it has to implement. All of this also needs to be robust. Browsers ingest the majority of what people see on the Internet and they have to handle every single edge case gracefully. Robust software is actually incredibly difficult and good error handling often adds a lot more code complexity. Security in a browser is also not easy, you’re parsing a bunch of different untrusted HTML, CSS, and JavaScript. You’re also executing untrusted code.
Then there is the monster that is CSS and layout. I can’t imagine being the people that have to write code dealing with that it’d drive me crazy.
Then there are all of the image formats, HTML5 canvases, videos, PDFs, etc. These all have to be parsed safely and displayed correctly as well.
There is also the entire HTTP spec that I didn’t even think to bring up. Yikes is that a monster too, you have to support all versions. Then there is all of that networking state and TLS + PKI.
There is likely so much that I’m still leaving out, like how all of this will also be cross platform and sometimes even cross architecture.
Adding on to this, while this article is fast approaching 20 years old, it gets into the quagmire that is web standards and how ~10 (now ~30) years of untrained amateurs (and/or professionals) doing their own interpretations of what the web standards mean–plus another decade or so before that in which there were no standards–has led to a situation of browsers needing to gracefully handle millions of contradictory instructions coming from different authors’ web sites.
Here’s a bonus: the W3C standards page. Try scrolling down it.
Thanks for these explanations, that makes a lot more sense now. I didn’t even think to consider browsers might be using something else than an off-the-shelf implementation for image/other file formats…, lol
Sorry I didn’t mean to imply they don’t use shared libs, they definitely do, but they have to integrate them into the larger system still and put consistent interfaces over them.
Yeah I realize that. My go-to comparison would be PDF. Where Firefox has PDF.js (I think?), Chromium just… implements basically seemingly the entire (exhaustive!) standard.
Well… according to ladybird, at this point they are more conformant than servo in web standards…
does the ability to view websites other than Space Jam '96 really improve your life?
I will give you that
Servo is still making quick progress though.
https://servo.org/wpt/