手写JVM(十二)-解释器

代码、内容参考来自于张秀宏大佬的自己动手写Java虚拟机 (Java核心技术系列)以及尚硅谷宋红康:JVM全套教程。

我们先编写一个简单的解释器。目前只能执行一个Java方法,在后面再不
断完善它。

1.整体代码

在/目录下创建interpreter.go文件,在其中定义interpret()函数,代码如下:

package main

import (
    "fmt"
    "jvmgo/ch05/classfile"
    "jvmgo/ch05/instructions"
    "jvmgo/ch05/instructions/base"
    "jvmgo/ch05/rtda"
)


func interpret(methodInfo *classfile.MemberInfo) {
    codeAttr := methodInfo.CodeAttribute()
    maxLocals := codeAttr.MaxLocals()
    maxStack := codeAttr.MaxStack()
    bytecode := codeAttr.Code()

    thread := rtda.NewThread()
    frame := thread.NewFrame(maxLocals, maxStack)
    thread.PushFrame(frame)

    defer catchErr(frame)
    loop(thread, bytecode)
}

func catchErr(frame *rtda.Frame) {
    if r := recover(); r != nil {
        fmt.Printf("LocalVars:%v\n", frame.LocalVars())
        fmt.Printf("OperandStack:%v\n", frame.OperandStack())
        panic(r)
    }
}

func loop(thread *rtda.Thread, bytecode []byte) {
    frame := thread.PopFrame()
    reader := &base.BytecodeReader{}

    for {
        pc := frame.NextPC()
        thread.SetPC(pc)

        // decode
        reader.Reset(bytecode, pc)
        opcode := reader.ReadUint8()
        inst := instructions.NewInstruction(opcode)
        inst.FetchOperands(reader)
        frame.SetNextPC(reader.PC())

        // execute
        fmt.Printf("pc:%2d inst:%T %v\n", pc, inst, inst)
        inst.Execute(frame)
    }
}

 

interpret()方法的参数是MemberInfo指针,调用MemberInfo结构体的CodeAttribute()方法可以获取它的Code属性。

CodeAttribute()方法是新增加的,代码在ch05\classfile\member_info.go文件中,代码如下:

func (self *MemberInfo) CodeAttribute() *CodeAttribute {
    for _, attrInfo := range self.attributes {
        switch attrInfo.(type) {
        case *CodeAttribute:
            return attrInfo.(*CodeAttribute)
        }
    }
    return nil
}

 

得到Code属性之后,可以进一步获得执行方法所需的局部变量表和操作数栈空间,以及方法的字节码。interpret()方法的其余代码先创建一个Thread实例,然后创建一个帧并把它推入Java虚拟机栈顶,最后执行方法。完整的代码如下:

 

Thread结构体的NewFrame()方法是新增加的,代码在ch05\rtda\thread.go文件中,如下所示:

func (self *Thread) NewFrame(maxLocals, maxStack uint) *Frame {
    return newFrame(self, maxLocals, maxStack)
}

Frame结构体也有变化,增加了两个字段这两个字段主要是为了实现跳转指令Branch()方法而添加的,以及和Getter方法,Frame结构体的newFrame()方法也相应发生了变化,改动如下(在ch05\rtda\frame.go文件中):

package rtda

// stack frame
type Frame struct {
    lower        *Frame // stack is implemented as linked list
    localVars    LocalVars
    operandStack *OperandStack
    thread       *Thread
    nextPC       int // the next instruction after the call
}

func newFrame(thread *Thread, maxLocals, maxStack uint) *Frame {
    return &Frame{
        thread:       thread,
        localVars:    newLocalVars(maxLocals),
        operandStack: newOperandStack(maxStack),
    }
}

// getters & setters
func (self *Frame) LocalVars() LocalVars {
    return self.localVars
}
func (self *Frame) OperandStack() *OperandStack {
    return self.operandStack
}
func (self *Frame) Thread() *Thread {
    return self.thread
}
func (self *Frame) NextPC() int {
    return self.nextPC
}
func (self *Frame) SetNextPC(nextPC int) {
    self.nextPC = nextPC
}

 

回到interpret()方法,我们的解释器目前还没有办法优雅地结束运行。因为每个方法的最后一条指令都是某个return指令,而还没有实现return指令,所以方法在执行过程中必定会出现错误,此时解释器逻辑会转到catchErr()函数,把局部变量表和操作数栈的内容打印出来,以此来观察方法的执行结果。

 

loop()函数循环执行“计算pc、解码指令、执行指令”这三个步骤,直到遇到错误!

 

NewInstruction()。这个函数是switch-case语句,根据操作码创建具体的指令,代码在instructions\factory.go文件中,如下所示:

package instructions

import "fmt"
import "jvmgo/ch05/instructions/base"
import . "jvmgo/ch05/instructions/comparisons"
import . "jvmgo/ch05/instructions/constants"
import . "jvmgo/ch05/instructions/control"
import . "jvmgo/ch05/instructions/conversions"
import . "jvmgo/ch05/instructions/extended"
import . "jvmgo/ch05/instructions/loads"
import . "jvmgo/ch05/instructions/math"
import . "jvmgo/ch05/instructions/stack"
import . "jvmgo/ch05/instructions/stores"

// NoOperandsInstruction singletons
var (
    nop = &NOP{}
    aconst_null = &ACONST_NULL{}
    iconst_m1 = &ICONST_M1{}
    iconst_0 = &ICONST_0{}
    iconst_1 = &ICONST_1{}
    iconst_2 = &ICONST_2{}
    iconst_3 = &ICONST_3{}
    iconst_4 = &ICONST_4{}
    iconst_5 = &ICONST_5{}
    lconst_0 = &LCONST_0{}
    lconst_1 = &LCONST_1{}
    fconst_0 = &FCONST_0{}
    fconst_1 = &FCONST_1{}
    fconst_2 = &FCONST_2{}
    dconst_0 = &DCONST_0{}
    dconst_1 = &DCONST_1{}
    iload_0 = &ILOAD_0{}
    iload_1 = &ILOAD_1{}
    iload_2 = &ILOAD_2{}
    iload_3 = &ILOAD_3{}
    lload_0 = &LLOAD_0{}
    lload_1 = &LLOAD_1{}
    lload_2 = &LLOAD_2{}
    lload_3 = &LLOAD_3{}
    fload_0 = &FLOAD_0{}
    fload_1 = &FLOAD_1{}
    fload_2 = &FLOAD_2{}
    fload_3 = &FLOAD_3{}
    dload_0 = &DLOAD_0{}
    dload_1 = &DLOAD_1{}
    dload_2 = &DLOAD_2{}
    dload_3 = &DLOAD_3{}
    aload_0 = &ALOAD_0{}
    aload_1 = &ALOAD_1{}
    aload_2 = &ALOAD_2{}
    aload_3 = &ALOAD_3{}
    // iaload = &IALOAD{}
    // laload = &LALOAD{}
    // faload = &FALOAD{}
    // daload = &DALOAD{}
    // aaload = &AALOAD{}
    // baload = &BALOAD{}
    // caload = &CALOAD{}
    // saload = &SALOAD{}
    istore_0 = &ISTORE_0{}
    istore_1 = &ISTORE_1{}
    istore_2 = &ISTORE_2{}
    istore_3 = &ISTORE_3{}
    lstore_0 = &LSTORE_0{}
    lstore_1 = &LSTORE_1{}
    lstore_2 = &LSTORE_2{}
    lstore_3 = &LSTORE_3{}
    fstore_0 = &FSTORE_0{}
    fstore_1 = &FSTORE_1{}
    fstore_2 = &FSTORE_2{}
    fstore_3 = &FSTORE_3{}
    dstore_0 = &DSTORE_0{}
    dstore_1 = &DSTORE_1{}
    dstore_2 = &DSTORE_2{}
    dstore_3 = &DSTORE_3{}
    astore_0 = &ASTORE_0{}
    astore_1 = &ASTORE_1{}
    astore_2 = &ASTORE_2{}
    astore_3 = &ASTORE_3{}
    // iastore = &IASTORE{}
    // lastore = &LASTORE{}
    // fastore = &FASTORE{}
    // dastore = &DASTORE{}
    // aastore = &AASTORE{}
    // bastore = &BASTORE{}
    // castore = &CASTORE{}
    // sastore = &SASTORE{}
    pop = &POP{}
    pop2 = &POP2{}
    dup = &DUP{}
    dup_x1 = &DUP_X1{}
    dup_x2 = &DUP_X2{}
    dup2 = &DUP2{}
    dup2_x1 = &DUP2_X1{}
    dup2_x2 = &DUP2_X2{}
    swap = &SWAP{}
    iadd = &IADD{}
    ladd = &LADD{}
    fadd = &FADD{}
    dadd = &DADD{}
    isub = &ISUB{}
    lsub = &LSUB{}
    fsub = &FSUB{}
    dsub = &DSUB{}
    imul = &IMUL{}
    lmul = &LMUL{}
    fmul = &FMUL{}
    dmul = &DMUL{}
    idiv = &IDIV{}
    ldiv = &LDIV{}
    fdiv = &FDIV{}
    ddiv = &DDIV{}
    irem = &IREM{}
    lrem = &LREM{}
    frem = &FREM{}
    drem = &DREM{}
    ineg = &INEG{}
    lneg = &LNEG{}
    fneg = &FNEG{}
    dneg = &DNEG{}
    ishl = &ISHL{}
    lshl = &LSHL{}
    ishr = &ISHR{}
    lshr = &LSHR{}
    iushr = &IUSHR{}
    lushr = &LUSHR{}
    iand = &IAND{}
    land = &LAND{}
    ior = &IOR{}
    lor = &LOR{}
    ixor = &IXOR{}
    lxor = &LXOR{}
    i2l = &I2L{}
    i2f = &I2F{}
    i2d = &I2D{}
    l2i = &L2I{}
    l2f = &L2F{}
    l2d = &L2D{}
    f2i = &F2I{}
    f2l = &F2L{}
    f2d = &F2D{}
    d2i = &D2I{}
    d2l = &D2L{}
    d2f = &D2F{}
    i2b = &I2B{}
    i2c = &I2C{}
    i2s = &I2S{}
    lcmp = &LCMP{}
    fcmpl = &FCMPL{}
    fcmpg = &FCMPG{}
    dcmpl = &DCMPL{}
    dcmpg = &DCMPG{}
    // ireturn = &IRETURN{}
    // lreturn = &LRETURN{}
    // freturn = &FRETURN{}
    // dreturn = &DRETURN{}
    // areturn = &ARETURN{}
    // _return = &RETURN{}
    // arraylength = &ARRAY_LENGTH{}
    // athrow = &ATHROW{}
    // monitorenter = &MONITOR_ENTER{}
    // monitorexit = &MONITOR_EXIT{}
    // invoke_native = &INVOKE_NATIVE{}
)

func NewInstruction(opcode byte) base.Instruction {
    switch opcode {
    case 0x00:
        return nop
    case 0x01:
        return aconst_null
    case 0x02:
        return iconst_m1
    case 0x03:
        return iconst_0
    case 0x04:
        return iconst_1
    case 0x05:
        return iconst_2
    case 0x06:
        return iconst_3
    case 0x07:
        return iconst_4
    case 0x08:
        return iconst_5
    case 0x09:
        return lconst_0
    case 0x0a:
        return lconst_1
    case 0x0b:
        return fconst_0
    case 0x0c:
        return fconst_1
    case 0x0d:
        return fconst_2
    case 0x0e:
        return dconst_0
    case 0x0f:
        return dconst_1
    case 0x10:
        return &BIPUSH{}
    case 0x11:
        return &SIPUSH{}
    // case 0x12:
        // return &LDC{}
    // case 0x13:
        // return &LDC_W{}
    // case 0x14:
        // return &LDC2_W{}
    case 0x15:
        return &ILOAD{}
    case 0x16:
        return &LLOAD{}
    case 0x17:
        return &FLOAD{}
    case 0x18:
        return &DLOAD{}
    case 0x19:
        return &ALOAD{}
    case 0x1a:
        return iload_0
    case 0x1b:
        return iload_1
    case 0x1c:
        return iload_2
    case 0x1d:
        return iload_3
    case 0x1e:
        return lload_0
    case 0x1f:
        return lload_1
    case 0x20:
        return lload_2
    case 0x21:
        return lload_3
    case 0x22:
        return fload_0
    case 0x23:
        return fload_1
    case 0x24:
        return fload_2
    case 0x25:
        return fload_3
    case 0x26:
        return dload_0
    case 0x27:
        return dload_1
    case 0x28:
        return dload_2
    case 0x29:
        return dload_3
    case 0x2a:
        return aload_0
    case 0x2b:
        return aload_1
    case 0x2c:
        return aload_2
    case 0x2d:
        return aload_3
    // case 0x2e:
        // return iaload
    // case 0x2f:
        // return laload
    // case 0x30:
        // return faload
    // case 0x31:
        // return daload
    // case 0x32:
        // return aaload
    // case 0x33:
        // return baload
    // case 0x34:
        // return caload
    // case 0x35:
        // return saload
    case 0x36:
        return &ISTORE{}
    case 0x37:
        return &LSTORE{}
    case 0x38:
        return &FSTORE{}
    case 0x39:
        return &DSTORE{}
    case 0x3a:
        return &ASTORE{}
    case 0x3b:
        return istore_0
    case 0x3c:
        return istore_1
    case 0x3d:
        return istore_2
    case 0x3e:
        return istore_3
    case 0x3f:
        return lstore_0
    case 0x40:
        return lstore_1
    case 0x41:
        return lstore_2
    case 0x42:
        return lstore_3
    case 0x43:
        return fstore_0
    case 0x44:
        return fstore_1
    case 0x45:
        return fstore_2
    case 0x46:
        return fstore_3
    case 0x47:
        return dstore_0
    case 0x48:
        return dstore_1
    case 0x49:
        return dstore_2
    case 0x4a:
        return dstore_3
    case 0x4b:
        return astore_0
    case 0x4c:
        return astore_1
    case 0x4d:
        return astore_2
    case 0x4e:
        return astore_3
    // case 0x4f:
        // return iastore
    // case 0x50:
        // return lastore
    // case 0x51:
        // return fastore
    // case 0x52:
        // return dastore
    // case 0x53:
        // return aastore
    // case 0x54:
        // return bastore
    // case 0x55:
        // return castore
    // case 0x56:
        // return sastore
    case 0x57:
        return pop
    case 0x58:
        return pop2
    case 0x59:
        return dup
    case 0x5a:
        return dup_x1
    case 0x5b:
        return dup_x2
    case 0x5c:
        return dup2
    case 0x5d:
        return dup2_x1
    case 0x5e:
        return dup2_x2
    case 0x5f:
        return swap
    case 0x60:
        return iadd
    case 0x61:
        return ladd
    case 0x62:
        return fadd
    case 0x63:
        return dadd
    case 0x64:
        return isub
    case 0x65:
        return lsub
    case 0x66:
        return fsub
    case 0x67:
        return dsub
    case 0x68:
        return imul
    case 0x69:
        return lmul
    case 0x6a:
        return fmul
    case 0x6b:
        return dmul
    case 0x6c:
        return idiv
    case 0x6d:
        return ldiv
    case 0x6e:
        return fdiv
    case 0x6f:
        return ddiv
    case 0x70:
        return irem
    case 0x71:
        return lrem
    case 0x72:
        return frem
    case 0x73:
        return drem
    case 0x74:
        return ineg
    case 0x75:
        return lneg
    case 0x76:
        return fneg
    case 0x77:
        return dneg
    case 0x78:
        return ishl
    case 0x79:
        return lshl
    case 0x7a:
        return ishr
    case 0x7b:
        return lshr
    case 0x7c:
        return iushr
    case 0x7d:
        return lushr
    case 0x7e:
        return iand
    case 0x7f:
        return land
    case 0x80:
        return ior
    case 0x81:
        return lor
    case 0x82:
        return ixor
    case 0x83:
        return lxor
    case 0x84:
        return &IINC{}
    case 0x85:
        return i2l
    case 0x86:
        return i2f
    case 0x87:
        return i2d
    case 0x88:
        return l2i
    case 0x89:
        return l2f
    case 0x8a:
        return l2d
    case 0x8b:
        return f2i
    case 0x8c:
        return f2l
    case 0x8d:
        return f2d
    case 0x8e:
        return d2i
    case 0x8f:
        return d2l
    case 0x90:
        return d2f
    case 0x91:
        return i2b
    case 0x92:
        return i2c
    case 0x93:
        return i2s
    case 0x94:
        return lcmp
    case 0x95:
        return fcmpl
    case 0x96:
        return fcmpg
    case 0x97:
        return dcmpl
    case 0x98:
        return dcmpg
    case 0x99:
        return &IFEQ{}
    case 0x9a:
        return &IFNE{}
    case 0x9b:
        return &IFLT{}
    case 0x9c:
        return &IFGE{}
    case 0x9d:
        return &IFGT{}
    case 0x9e:
        return &IFLE{}
    case 0x9f:
        return &IF_ICMPEQ{}
    case 0xa0:
        return &IF_ICMPNE{}
    case 0xa1:
        return &IF_ICMPLT{}
    case 0xa2:
        return &IF_ICMPGE{}
    case 0xa3:
        return &IF_ICMPGT{}
    case 0xa4:
        return &IF_ICMPLE{}
    case 0xa5:
        return &IF_ACMPEQ{}
    case 0xa6:
        return &IF_ACMPNE{}
    case 0xa7:
        return &GOTO{}
    // case 0xa8:
        // return &JSR{}
    // case 0xa9:
        // return &RET{}
    case 0xaa:
        return &TABLE_SWITCH{}
    case 0xab:
        return &LOOKUP_SWITCH{}
    // case 0xac:
        // return ireturn
    // case 0xad:
        // return lreturn
    // case 0xae:
        // return freturn
    // case 0xaf:
        // return dreturn
    // case 0xb0:
        // return areturn
    // case 0xb1:
        // return _return
    // case 0xb2:
        // return &GET_STATIC{}
    // case 0xb3:
        // return &PUT_STATIC{}
    // case 0xb4:
        // return &GET_FIELD{}
    // case 0xb5:
        // return &PUT_FIELD{}
    // case 0xb6:
        // return &INVOKE_VIRTUAL{}
    // case 0xb7:
        // return &INVOKE_SPECIAL{}
    // case 0xb8:
        // return &INVOKE_STATIC{}
    // case 0xb9:
        // return &INVOKE_INTERFACE{}
    // case 0xba:
        // return &INVOKE_DYNAMIC{}
    // case 0xbb:
        // return &NEW{}
    // case 0xbc:
        // return &NEW_ARRAY{}
    // case 0xbd:
        // return &ANEW_ARRAY{}
    // case 0xbe:
        // return arraylength
    // case 0xbf:
        // return athrow
    // case 0xc0:
        // return &CHECK_CAST{}
    // case 0xc1:
        // return &INSTANCE_OF{}
    // case 0xc2:
        // return monitorenter
    // case 0xc3:
        // return monitorexit
    case 0xc4:
        return &WIDE{}
    // case 0xc5:
        // return &MULTI_ANEW_ARRAY{}
    case 0xc6:
        return &IFNULL{}
    case 0xc7:
        return &IFNONNULL{}
    case 0xc8:
        return &GOTO_W{}
    // case 0xc9:
        // return &JSR_W{}
    // case 0xca: breakpoint
    // case 0xfe: impdep1
    // case 0xff: impdep2
    default:
        panic(fmt.Errorf("Unsupported opcode: 0x%x!", opcode))
    }
}

有很大一部分指令是没有操作数的,所以没有必要每次都创建不同的实例。为了优化,可以给这些指令定义单例变量,
如:

对于这类指令,在NewInstruction()函数中直接返回单例变量即可,代码如下:

 

2.测试代码

考验一下虚拟机是否可以工作。代码如下:

java代码:

package jvmgo.book.ch03;

public class GaussShu {
    public static void main(String[] args) {
        int sum = 0;
        for (int i = 1; i <= 100; i++) {
            sum += i;
        }
        System.out.println(sum);
    }
}

 

下面改造main.go文件。首先修改import语句,代码如下:

package main

import (
    "fmt"
    "jvmgo/ch05/classfile"
    "jvmgo/ch05/classpath"
    "strings"
)


func main() {
    cmd := parseCmd()

    if cmd.versionFlag {
        fmt.Println("version 0.0.1")
    } else if cmd.helpFlag || cmd.class == "" {
        printUsage()
    } else {
        startJVM(cmd)
    }
}

func startJVM(cmd *Cmd) {
    cp := classpath.Parse(cmd.XjreOption, cmd.cpOption)
    className := strings.Replace(cmd.class, ".", "/", -1)
    cf := loadClass(className, cp)
    mainMethod := getMainMethod(cf)
    if mainMethod != nil {
        interpret(mainMethod)
    } else {
        fmt.Printf("Main method not found in class %s\n", cmd.class)
    }
}

func loadClass(className string, cp *classpath.Classpath) *classfile.ClassFile {
    classData, _, err := cp.ReadClass(className)
    if err != nil {
        panic(err)
    }

    cf, err := classfile.Parse(classData)
    if err != nil {
       panic(err)
    }

    return cf
}

func getMainMethod(cf *classfile.ClassFile) *classfile.MemberInfo {
    for _, m := range cf.Methods() {
        if m.Name() == "main" && m.Descriptor() == "([Ljava/lang/String;)V" {
            return m
        }
    }
    return nil
}

 

main函数不变,修改startJVM()函数,startJVM()首先调用loadClass()方法读取并解析class文件,然后调用getMainMethod()函数查找类的main()方法,最后调用interpret()函数解释执行main方法。

 

loadClass()函数的代码如下:

getMainMethod()函数的代码如下:

 

打开命令行窗口,执行下面的命令编译本章代码。

go install jvmgo\ch05

我将class文件放到这

ch05 -classpath D:\MAT_log -Xjre "D:\software\java\jre" GaussShu

方法执行的最后出现了错误,是正常的,局部变量表和操作数栈的状态也打印了出来,但可以看到5050这个数字。

 

3.参考

尚硅谷宋红康:JVM全套教程:https://www.bilibili.com/video/BV1PJ411n7xZ

周志明:深入理解java虚拟机

张秀宏:自己动手写Java虚拟机 (Java核心技术系列)

GO语言官网:Standard library – Go Packages

Java虚拟机规范:Chapter 4. The class File Format (oracle.com)

暂无评论

发送评论 编辑评论

|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇