最新公告
  • 欢迎您光临网站无忧模板网,本站秉承服务宗旨 履行“站长”责任,销售只是起点 服务永无止境!立即加入钻石VIP
  • Understanding the ECMAScript spec, part 3

    正文概述 掘金(字节前端)   2021-04-22   576

    In this episode, we’ll go deeper in the definition of the ECMAScript language and its syntax. If you’re not familiar withcontext-free grammars, now is a good time to check out the basics, since the spec uses context-free grammars to define the language.

    ECMAScript grammars

    The ECMAScript spec defines four grammars:

    Thelexical grammardescribes howUnicode code pointsare translated into a sequence of input elements (tokens, line terminators, comments, white space).

    Thesyntactic grammardefines how syntactically correct programs are composed of tokens.

    TheRegExp grammardescribes how Unicode code points are translated into regular expressions.

    Thenumeric string grammardescribes how Strings are translated into numeric values.

    Each grammar is defined as a context-free grammar, consisting of a set of productions.

    The grammars use slightly different notation: the syntactic grammar usesLeftHandSideSymbol :whereas the lexical grammar and the RegExp grammar useLeftHandSideSymbol ::and the numeric string grammar usesLeftHandSideSymbol :::.

    Next we’ll look into the lexical grammar and the syntactic grammar in more detail.

    Lexical grammar

    The spec defines ECMAScript source text as a sequence of Unicode code points. For example, variable names are not limited to ASCII characters but can also include other Unicode characters. The spec doesn’t talk about the actual encoding (for example, UTF-8 or UTF-16). It assumes that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.

    It’s not possible to tokenize ECMAScript source code in advance, which makes defining the lexical grammar slightly more complicated.

    For example, we cannot determine whether/is the division operator or the start of a RegExp without looking at the larger context it occurs in:

    const x = 10 / 5;
    

    Here/is aDivPunctuator.

    const r = /foo/;
    

    Here the first/is the start of aRegularExpressionLiteral.

    Templates introduce a similar ambiguity — the interpretation of}`depends on the context it occurs in:

    const what1 = 'temp';
    const what2 = 'late';
    const t = `I am a ${ what1 + what2 }`;
    

    Here`I am a ${isTemplateHeadand}`is aTemplateTail.

    if (0 == 1) {
    
    }`not very useful`;
    

    Here}is aRightBracePunctuatorand`is the start of aNoSubstitutionTemplate.

    Even though the interpretation of/and}`depends on their “context” — their position in the syntactic structure of the code — the grammars we’ll describe next are still context-free.

    The lexical grammar uses several goal symbols to distinguish between the contexts where some input elements are permitted and some are not. For example, the goal symbolInputElementDivis used in contexts where/is a division and/=is a division-assignment. TheInputElementDivproductions list the possible tokens which can be produced in this context:

    InputElementDiv :: 
    WhiteSpace 
    LineTerminator
    Comment 
    CommonToken
    DivPunctuator
    RightBracePunctuator
    

    In this context, encountering/produces theDivPunctuatorinput element. Producing aRegularExpressionLiteralis not an option here.

    On the other hand,InputElementRegExpis the goal symbol for the contexts where/is the beginning of a RegExp:

    InputElementRegExp :: 
    WhiteSpace
    LineTerminator
    Comment  
    CommonToken
    RightBracePunctuator
    RegularExpressionLiteral
    

    As we see from the productions, it’s possible that this produces theRegularExpressionLiteralinput element, but producingDivPunctuatoris not possible.

    Similarly, there is another goal symbol,InputElementRegExpOrTemplateTail, for contexts whereTemplateMiddleandTemplateTailare permitted, in addition toRegularExpressionLiteral. And finally,InputElementTemplateTailis the goal symbol for contexts where onlyTemplateMiddleandTemplateTailare permitted butRegularExpressionLiteralis not permitted.

    In implementations, the syntactic grammar analyzer (“parser”) may call the lexical grammar analyzer (“tokenizer” or “lexer”), passing the goal symbol as a parameter and asking for the next input element suitable for that goal symbol.

    Syntactic grammar

    We looked into the lexical grammar, which defines how we construct tokens from Unicode code points. The syntactic grammar builds on it: it defines how syntactically correct programs are composed of tokens.

    Example: Allowing legacy identifiers

    Introducing a new keyword to the grammar is a possibly breaking change — what if existing code already uses the keyword as an identifier?

    For example, beforeawaitwas a keyword, someone might have written the following code:

    function old() {
    var await;
    }
    

    The ECMAScript grammar carefully added theawaitkeyword in such a way that this code continues to work. Inside async functions,awaitis a keyword, so this doesn’t work:

    async function modern() {
    var await; // Syntax error
    }
    

    Allowingyieldas an identifier in non-generators and disallowing it in generators works similarly.

    Understanding howawaitis allowed as an identifier requires understanding ECMAScript-specific syntactic grammar notation. Let’s dive right in!

    Productions and shorthands

    Let’s look at how the productions forVariableStatementare defined. At the first glance, the grammar can look a bit scary:

    VariableStatement[Yield, Await] :  
    var VariableDeclarationList[+In, ?Yield, ?Await] ;
    

    What do the subscripts ([Yield, Await]) and prefixes (+in+Inand?in?Async) mean?

    The notation is explained in the section Grammar Notation

    The subscripts are a shorthand for expressing a set of productions, for a set of left-hand side symbols, all at once. The left-hand side symbol has two parameters, which expands into four "real" left-hand side symbols:VariableStatement,VariableStatement_Yield,VariableStatement_Await, andVariableStatement_Yield_Await.

    Note that here the plainVariableStatementmeans “VariableStatementwithout_Awaitand_Yield”. It should not be confused withVariableStatement[Yield, Await].

    On the right-hand side of the production, we see the shorthand+In, meaning "use the version with_In", and?Await, meaning “use the version with_Awaitif and only if the left-hand side symbol has_Await” (similarly with?Yield).

    The third shorthand,~Foo, meaning “use the version without_Foo”, is not used in this production.

    With this information, we can expand the productions like this:

    VariableStatement :  
    var VariableDeclarationList_In ;
    
    VariableStatement_Yield : 
    var VariableDeclarationList_In_Yield ;
    
    VariableStatement_Await : 
    var VariableDeclarationList_In_Await ;
    
    VariableStatement_Yield_Await : 
    var VariableDeclarationList_In_Yield_Await ;
    

    Ultimately, we need to find out two things:

    1. Where is it decided whether we’re in the case with_Awaitor without_Await?
    2. Where does it make a difference — where do the productions forSomething_AwaitandSomething(without_Await) diverge?

    _Awaitor no_Await

    Let’s tackle question 1 first. It’s somewhat easy to guess that non-async functions and async functions differ in whether we pick the parameter_Awaitfor the function body or not. Reading the productions for async function declarations, we findthis:

    AsyncFunctionBody : 
    FunctionBody[~Yield, +Await]
    

    Note thatAsyncFunctionBodyhas no parameters — they get added to theFunctionBodyon the right-hand side.

    If we expand this production, we get:

    AsyncFunctionBody : 
    FunctionBody_Await
    

    In other words, async functions haveFunctionBody_Await, meaning a function body whereawaitis treated as a keyword.

    On the other hand, if we’re inside a non-async function,the relevant productionis:

    FunctionDeclaration[Yield, Await, Default] : 
    function BindingIdentifier[?Yield, ?Await] ( FormalParameters[~Yield, ~Await] ) { FunctionBody[~Yield, ~Await] }
    

    (FunctionDeclarationhas another production, but it’s not relevant for our code example.)

    To avoid combinatorial expansion, let’s ignore theDefaultparameter which is not used in this particular production.

    The expanded form of the production is:

    FunctionDeclaration : 
    function BindingIdentifier ( FormalParameters ) { FunctionBody }
    
    FunctionDeclaration_Yield : 
    function BindingIdentifier_Yield ( FormalParameters ) { FunctionBody }
    
    FunctionDeclaration_Await :  function BindingIdentifier_Await ( FormalParameters ) { FunctionBody }
    
    FunctionDeclaration_Yield_Await : 
    function BindingIdentifier_Yield_Await ( FormalParameters ) { FunctionBody }
    

    In this production we always getFunctionBodyandFormalParameters(without_Yieldand without_Await), since they are parameterized with[~Yield, ~Await]in the non-expanded production.

    Function name is treated differently: it gets the parameters_Awaitand_Yieldif the left-hand side symbol has them.

    To summarize: Async functions have aFunctionBody_Awaitand non-async functions have aFunctionBody(without_Await). Since we’re talking about non-generator functions, both our async example function and our non-async example function are parameterized without_Yield.

    Maybe it’s hard to remember which one isFunctionBodyand whichFunctionBody_Await. IsFunctionBody_Awaitfor a function whereawaitis an identifier, or for a function whereawaitis a keyword?

    You can think of the_Awaitparameter meaning "awaitis a keyword". This approach is also future proof. Imagine a new keyword,blobbeing added, but only inside "blobby" functions. Non-blobby non-async non-generators would still haveFunctionBody(without_Await,_Yieldor_Blob), exactly like they have now. Blobby functions would have aFunctionBody_Blob, async blobby functions would haveFunctionBody_Await_Bloband so on. We’d still need to add theBlobsubscript to the productions, but the expanded forms ofFunctionBodyfor already existing functions stay the same.

    Disallowing await as an identifier

    Next, we need to find out howawaitis disallowed as an identifier if we're inside aFunctionBody_Await.

    We can follow the productions further to see that the_Awaitparameter gets carried unchanged fromFunctionBodyall the way to theVariableStatementproduction we were previously looking at.

    Thus, inside an async function, we’ll have aVariableStatement_Awaitand inside a non-async function, we’ll have a VariableStatement.

    We can follow the productions further and keep track of the parameters. We already saw the productions forVariableStatement:

    VariableStatement[Yield, Await] :
    var VariableDeclarationList[+In, ?Yield, ?Await] ;
    

    All productions forVariableDeclarationListjust carry the parameters on as is:

    VariableDeclarationList[In, Yield, Await] :
    VariableDeclaration[?In, ?Yield, ?Await]
    

    (Here we show only theproductionrelevant to our example.)

    VariableDeclaration[In, Yield, Await] :  
    BindingIdentifier[?Yield, ?Await] Initializer[?In, ?Yield, ?Await] opt
    

    Theoptshorthand means that the right-hand side symbol is optional; there are in fact two productions, one with the optional symbol, and one without.

    In the simple case relevant to our example,VariableStatementconsists of the keywordvar, followed by a singleBindingIdentifierwithout an initializer, and ending with a semicolon.

    To disallow or allowawaitas aBindingIdentifier, we hope to end up with something like this:

    BindingIdentifier_Await :  
    Identifier 
    
    yieldBindingIdentifier :  
    Identifier
    yield 
    await
    

    This would disallowawaitas an identifier inside async functions and allow it as an identifier inside non-async functions.

    But the spec doesn’t define it like this, instead we find thisproduction:

    BindingIdentifier[Yield, Await] : 
    Identifier
    yield
    await
    

    Expanded, this means the following productions:

    BindingIdentifier_Await : 
    Identifier 
    yield 
    await
    BindingIdentifier : 
    Identifier 
    yield  
    await
    

    (We’re omitting the productions forBindingIdentifier_YieldandBindingIdentifier_Yield_Awaitwhich are not needed in our example.)

    This looks likeawaitandyieldwould be always allowed as identifiers. What’s up with that? Is the whole blog post for nothing?

    Statics semantics to the rescue

    It turns out that static semantics are needed for forbiddingawaitas an identifier inside async functions.

    Static semantics describe static rules — that is, rules that are checked before the program runs.

    In this case, thestatic semantics for BindingIdentifierdefine the following syntax-directed rule:

    BindingIdentifier[Yield, Await] : await
    

    Effectively, this forbids theBindingIdentifier_Await : awaitproduction.

    The spec explains that the reason for having this production but defining it as a Syntax Error by the static semantics is because of interference with automatic semicolon insertion (ASI).

    Remember that ASI kicks in when we’re unable to parse a line of code according to the grammar productions. ASI tries to add semicolons to satisfy the requirement that statements and declarations must end with a semicolon. (We’ll describe ASI in more detail in a later episode.)

    Consider the following code (example from the spec):

    async function too_few_semicolons() {  
    let 
    await 0;
    }
    

    If the grammar disallowedawaitas an identifier, ASI would kick in and transform the code into the following grammatically correct code, which also usesletas an identifier:

    async function too_few_semicolons() {
    let; 
    await 0;
    }
    

    This kind of interference with ASI was deemed too confusing, so static semantics were used for disallowingawaitas an identifier.

    Disallowed StringValues of identifiers

    There’s also another related rule:

    BindingIdentifier : Identifier
    

    This might be confusing at first.Identifieris defined like this:

    Identifier :  
    IdentifierName but not ReservedWord
    

    awaitis aReservedWord, so how can anIdentifierever beawait?

    As it turns out,Identifiercannot beawait, but it can be something else whoseStringValueis"await"— a different representation of the character sequenceawait.

    Static semantics for identifier namesdefine how theStringValueof an identifier name is computed. For example, the Unicode escape sequence forais\u0061, so\u0061waithas theStringValue"await".\u0061waitwon’t be recognized as a keyword by the lexical grammar, instead it will be anIdentifier. The static semantics for forbid using it as a variable name inside async functions.

    So this works:

    function old() { 
    var \u0061wait;
    }
    

    And this doesn’t:

    async function modern() {
    var \u0061wait; // Syntax error
    }
    

    Summary

    In this episode, we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar. As an example, we looked into forbidding usingawaitas an identifier inside async functions but allowing it inside non-async functions.

    Other interesting parts of the syntactic grammar, such as automatic semicolon insertion and cover grammars will be covered in a later episode. Stay tuned!


    下载网 » Understanding the ECMAScript spec, part 3

    常见问题FAQ

    免费下载或者VIP会员专享资源能否直接商用?
    本站所有资源版权均属于原作者所有,这里所提供资源均只能用于参考学习用,请勿直接商用。若由于商用引起版权纠纷,一切责任均由使用者承担。更多说明请参考 VIP介绍。
    提示下载完但解压或打开不了?
    最常见的情况是下载不完整: 可对比下载完压缩包的与网盘上的容量,若小于网盘提示的容量则是这个原因。这是浏览器下载的bug,建议用百度网盘软件或迅雷下载。若排除这种情况,可在对应资源底部留言,或 联络我们.。
    找不到素材资源介绍文章里的示例图片?
    对于PPT,KEY,Mockups,APP,网页模版等类型的素材,文章内用于介绍的图片通常并不包含在对应可供下载素材包内。这些相关商业图片需另外购买,且本站不负责(也没有办法)找到出处。 同样地一些字体文件也是这种情况,但部分素材会在素材包内有一份字体下载链接清单。
    模板不会安装或需要功能定制以及二次开发?
    请QQ联系我们

    发表评论

    还没有评论,快来抢沙发吧!

    如需帝国cms功能定制以及二次开发请联系我们

    联系作者

    请选择支付方式

    ×
    迅虎支付宝
    迅虎微信
    支付宝当面付
    余额支付
    ×
    微信扫码支付 0 元