Apr 26, 2018 - 1 Introduction. Robot-assisted minimally invasive surgery has become increasingly popular as it addresses various shortcomings of conventional minimally invasive surgery (MIS). . Robotic systems allow for motion scaling, tremor comp
Sep 2, 2016 - The video and action classification have extremely evolved by deep neural networks specially with two stream CNN using RGB and optical flow as inputs and they present out- standing performance in terms of video analysis. One of the shor
Aug 14, 2017 - challenge of modelling spectro-temporal dynamics for speech ... spectral kernels of a homogeneous architecture are optimal for .... 3D Max-pooling. A 3D max-pooling layer follows each. 3D CNN. However, to preserve the spectro-temporal
Oct 20, 2007 - For, each of the three modeled ULX populations, we carried out 30,000 binary evolution calculations. All models were run using 60 nodes of the elix3 Beowulf cluster located at the. University of Sherbrooke, Quebec. The run time for eac
Feb 22, 2013 - the DES filter bandpasses. The seven LEDs selected to illuminate the DECam filter bandpasses are manufac- tured by Roithner Lasertechnik ...
Feb 22, 2013 - Department of Physics and Astronomy, Texas A&M University, College Station,. TX 77843-4242. Abstract. DECal is a new calibration system for the CTIO 4 m Blanco telescope. ... Both of these systems share a new. Lambertian flat field scr
Feb 22, 2013 - with the Labview interface. The photodiodes we use exhibit a slight change in sensitivity with temperature at the far red end of their sensitivity ...
LED array (Adafruit, 4mm spacing, controlled by an Arduino) is placed at 86mm beneath the sample. The central 15 15. Ã red (central wavelength 631.13nm with 20nm bandwidth) LEDs are used to provide angle-varied illuminations, resulting in a final sy
Jul 31, 2016 - 3 the x and y axes of the image. An alternative method  extends this calibra- tion procedure to 3D US and shows that assuming an isotropic model (single scale factor) ... closed-form solution for the similarity registration of two
1ã School of Mechanical Engineering and Automation, Beijing Univ. of Aeronautics and Astronautics,. Beijng 100191ï¼China. 2ã Department of Control Science and Engineering, Harbin Institute of Technology, Harbin 150001, China. 3ã State Key Labo
Apr 28, 2017 - system, including the programmable growth chamber, robot, scanner, data transfer and analysis is fully automated in such a way that a naive user can, in .... tion which can be useful for accelerating crop production. In recent years ..
May 10, 2013 - commissioning tests, as well as in-situ calibration results during the physics ... nermost acrylic vessel (IAV) with 3-meter diameter holds the target - 20t ..... speed of 7 mm/s, and the turntable rotates at a speed of 1.8 degree/s. .
Jan 16, 2015 - survival probability rise in 8B neutrinos and, possibly, direct measurements of the CNO neutrinos and the pp ...... the fibre was 51 mm from the detector, a PMT connected to a light guide with a Y-Z slit of 2.8 Ã ..... subdominant wit
May 10, 2013 - cDepartment of Physics, College of William and Mary, Williamsburg, Virginia, USA. dLawrence ..... (overall range 260 degrees). ..... The Main Program also provides the main interface for user operation. The Data Fetcher fetches reading
Sep 29, 2005 - week of prototype operation. Key words: Light emitting diode; ... The BTeV project at the FNAL. Tevatron Collider, recently terminated by the ...
May 13, 2013 - unit is a fully automated robotic system capable of deploying an LED and var- .... A fully automated calibration system is the natural answer for require- ment 1 ... inside of the bell jar. SS Bell Jar: enclosing ACU. Borated-Polyethyl
Jun 17, 2014 - E-mail: [email protected] ABSTRACT: KEDR detector has the tagging system (TS) to study the gamma-gamma processes.
Sep 29, 2005 - A highly stable monitoring system based on blue and red light emitting diodes ... To monitor the PMT gain changes we use a red light pulser, since the red ... (Royal Blue Radiometric). (Red). Brand. Luxeon V Emitter. Luxeon Emitter. Ty
Nov 27, 2017 - The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. Recently, the performance le
Dec 13, 2014 - atoms limited by Gaussian noise as well as projection noise limited data from repeated single-shot measurements of a single ... problems including model identification, model discrim- ination and model verification. ...... pletely flex
Completely and detailed comparison and analysis are performed to show ... 2) An automatically constructed TreeSegNet is proposed to face the complex data ...... Intel. Data Analysis for Real-Life. Applications: Theory and Practice, 2012: 18-34. .
Mar 28, 2018 - most compute-intensive component of the CNN consuming more than 90% of the execution time . .... proach can allow multiple layers running simultaneously with different image frames. However, the evaluation of ..... technique, work-
Jun 8, 2018 - based on the observation that histopathology images are inherently sym- ... attributed to the efficient sharing of parameters in convolutional layers. As a ..... further research of rotation equivariant models in the medical image ...
Two-path 3D CNNs for calibration of system parameters for OCT-based motion compensation Nils Gessert∗a , Martin Gromniak∗a , Matthias Schl¨ utera , and Alexander Schlaefera a
Institute of Medical Technology, Hamburg University of Technology, Am Schwarzenberg-Campus 3, 21073 Hamburg, Geramny
arXiv:1810.09582v1 [cs.CV] 22 Oct 2018
ABSTRACT Automatic motion compensation and adjustment of an intraoperative imaging modality’s field of view is a common problem during interventions. Optical coherence tomography (OCT) is an imaging modality which is used in interventions due to its high spatial resolution of few micrometers and its temporal resolution of potentially several hundred volumes per second. However, performing motion compensation with OCT is problematic due to its small field of view which might lead to tracked objects being lost quickly. We propose a novel deep learning-based approach that directly learns input parameters of motors that move the scan area for motion compensation from optical coherence tomography volumes. We design a two-path 3D convolutional neural network (CNN) architecture that takes two volumes with an object to be tracked as its input and predicts the necessary motor input parameters to compensate the object’s movement. In this way, we learn the calibration between object movement and system parameters for motion compensation with arbitrary objects. Thus, we avoid error-prone hand-eye calibration and handcrafted feature tracking from classical approaches. We achieve an average correlation coefficient of 0.998 between predicted and ground-truth motor parameters which leads to sub-voxel accuracy. Furthermore, we show that our deep learning model is real-time capable for use with the system’s high volume acquisition frequency. Keywords: Deep Learning, 3D CNN, Motion Compensation, OCT
1. INTRODUCTION Optical coherence tomography (OCT) is an interferometric imaging modality that allows for volumetric imaging with micrometer-level resolution. OCT has been used in intraoperative scenarios1, 2 such as neurosurgery3 and ophthalmic surgery.4 Recently, systems with high-frequency acquisition have been proposed5, 6 which allows for fast imaging and object tracking during interventions. As the field of view (FOV) of high-resolution imaging modalities is often limited, the current region-of-interest (ROI) might be lost quickly due to patient and surgical tool movement. As manual adjustment of the imaging system’s FOV disrupts the surgical workflow, automatic adjustment is desirable for keeping track of the current ROI. So far, OCT-based tracking and compensation can be performed with markerless approaches using cumbersome and potentially error-prone image-based registration.7 Alternatively, markers can be introduced to the setup which can be invasive but promises higher accuracy. For example, detection of artificial landmarks carved into bone structures has been shown.8 More recently, a deep learning-based method has been proposed where a model learns to estimate the pose of a very small, arbitrary marker geometry directly from OCT volumes.9 For motion compensation, all these methods require a hand-eye calibration between imaging system and compensation device which is difficult for OCT.10 In this paper, we propose a calibration strategy between OCT volumes and a compensation system for marker-based tracking. We consider the setup shown in Figure 1 with an OCT system that has a mechanism for lateral and axial FOV adjustment. Two motors control mirrors for lateral beam redirection and one motor controls a reference arm for adjusting the scan distance. Thus, the motors can be used to compensate motion of the marker object and keep it within the FOV. In order to compensate motion of the object in this setup, a Further author information: (Send correspondence to Nils Gessert) Nils Gessert: [email protected] Martin Gromniak: [email protected] ∗ Both authors contributed equally.
Movement x Motor 1 Volume Scan Head Motor 2
Lens Marker Object Motor 3
Figure 1: The object geometry to be tracked is shown under a digital microscope (left) and in a rendered OCT volume (center). The experimental setup for motion compensation and data acquisition is shown as a draft (right). classic approach first requires an OCT-based hand-eye calibration. Second, either an image-based registration of OCT volumes11 is required or a known marker geometry needs to be detected. Instead, we propose a new direct calibration approach between volumes and motors that combines both steps in a single deep learning model. For this purpose, we extend the recent idea of a 3D convolutional neural network (CNN) model for the estimation of an arbitrary marker’s pose9 to the calibration problem. Instead of a single volume, our model receives two volumes with an object in different areas of the FOV. The two volumes are processed with a two-path 3D CNN architecture.12 At the output, the model predicts motor steps that need to be driven to compensate the motion between the two object states. In this way, we combine hand-eye calibration of OCT volumes with motors and marker detection in a single trainable model. For training of the two-path 3D CNN model we acquire 7 datasets of an object and we show with a separate dataset that the model learns to compensate the object’s motion. Thus, the object can be effectively used as marker to keep track of a desired region of interest. Last, we show that the model has low inference times which allows for real-time estimation, despite performing volumetric data processing.
2. METHODS AND MATERIALS 2.1 Experimental Setup The setup for OCT-based motion compensation and the object to be tracked is shown are Figure 1. The marker object is milled from a polyoxymethylene block with a size of 1 mm × 1 mm × 1 mm. We carved out an inner structure in order have subsurface features that can be imaged by OCT and exploited by deep learning models, as recently suggested.9 The setup itself consists of a swept-source OCT device (OMES, Optores) with an AScan rate of 1.59 MHz. We use a scan head that provides volumes of size 32 × 32 × 460 voxels. This leads to a potential acquisition speed of 833 volumes per second. For a uniform volume size and reduced processing time, we downsample the volumes to a size of 32 × 32 × 32 voxels which covers a volume of approximately 3 mm × 3 mm × 3.5 mm. The volume’s position in space can be adjusted by three stepper motors. Two motors control mirrors that can laterally move the FOV by ≈60 mm. The third motor moves the mirror in the reference arm in a range of ≈160 mm. For data acquisition, we also use a hexapod robot with the marker object attached to it. The robot’s purpose is to move the object into different orientations for a higher variability in object appearance.
2.2 Data Acquisition For training the deep learning model, a large, labeled dataset is required which we acquire automatically with the setup. In each step, the hexapod moves the object into a random orientation. Then, we move the FOV to two randomly generated motor states s1 and s2 and acquire a volume in each state. Thus, a single labeled example consists of two volumes and the label sd = s1 − s2 = (∆x, ∆y, ∆z) which needs to be driven to overlay the
ResBlock 8F → 8F
ResBlock 4F → 4F
ResBlock /2 Res. Block /2 4F → 8F
ResBlock 2F → 2F
ResBlock /2 F → 2F
3×3×3 Conv. F /2
ResBlock /2 Res. Block /2 4F → 4F
ResBlock 2F → 2F
ResBlock /2 F → 2F
3×3×3 Conv. F /2
Figure 2: The proposed two-path 3D CNN architecture. In each block, the change in the number of feature maps is denoted. F refers to the base number of feature maps. ResBlock refers to residual blocks as introduced by He et al.13 /2 denotes spatial downsampling with a stride of 2. Note, that in the initial two path the model parameters are shared. Concat denotes tensor concatenation along the feature map dimension. GAP denotes global average pooling. GAP is followed by a single linear fully-connected layer. The ResBlocks marked with an asterisk are omitted in the reduced architecture. Table 1: Performance results on the test set and inference times for the different models. MAE refers to the mean absolute error in motor steps. 2.5 motor steps in x and y direction roughly correspond to a shift of one voxel. For the z direction, roughly 190 motor steps correspond to a shift of one voxel. ACC denotes the average correlation coefficient between predictions and targets. F denotes the base number of feature maps, see Figure 2. Red. denotes a reduced architecture with less ResBlocks. MAE ∆x MAE ∆y MAE ∆z ACC Inf. Time Resnet F = 60 1.628 ± 1.326 1.426 ± 1.166 42.41 ± 34.34 0.9983 7.51 ± 0.12 ms Resnet F = 45 1.990 ± 1.606 1.634 ± 1.290 48.06 ± 36.45 0.9984 5.40 ± 0.12 ms Resnet F = 30 1.633 ± 1.270 1.285 ± 1.078 43.83 ± 34.84 0.9983 3.73 ± 0.16 ms Resnet F = 15 1.792 ± 1.393 1.564 ± 1.243 53.12 ± 42.13 0.9978 2.51 ± 0.17 ms Resnet F = 15 Red. 2.008 ± 1.619 1.728 ± 1.405 55.91 ± 45.84 0.9966 1.79 ± 0.16 ms volumes on top of each other. In total, we acquire 7 datasets with approximately 5000 examples each. Between each dataset acquisition we rearrange the marker in order to avoid overfitting to a particular initial marker pose.
2.3 Model The two-path 3D CNN architecture we employ is shown in Figure 2. Each path receives a volume which is processed independently up to a concatenation point. Afterwards, the features are processed jointly and finally, the state difference sd that would be required for compensation is predicted at the output. We rely on the ResNet principle14 with identity connections for improved gradient propagation. We also share parameters between the two paths as they receive similar volumes and therefore are likely to require similar features. As the OCT system’s volume acquisition rate is very high, we investigate the performance-inference time trade-off by considering downscaled variants of the model shown in Figure 2. For this purpose we vary the base feature map size F which controls the overall capacity of the model. Also, we consider a reduced version of the architecture with less ResBlocks. We train the models with a mean squared error loss using stochastic gradient descent using the Adam15 optimizer, a constant learning rate of 5e−4 and a batch size of 40. We split off an entire independent dataset for testing with 5000 examples. We implement our model using Tensorflow.16 Training and inference time tests are performed on an NVIDIA GeForce GTX 1080 Ti graphics card.
300 250 200 150 100 50 0
801-1200 1201-1600 1601-2000
Figure 3: Absolute errors versus the magnitude of required motor steps along each dimension for F = 30. The horizontal axes show the absolute motor steps along each dimension while the vertical axes show the absolute errors of the predictions. The error increases only minor until ∼40 steps in lateral x- and y-dimension and remains almost constant in the z-dimension.
3. RESULTS The performance results on the test set and inference times for several model variants with differently sized architectures are shown in Table 1. Overall, the models’ performance is very high with an ACC larger than 0.996. When downscaling the model, performance slightly deteriorates with base feature maps size below F = 30. However, the inference time of the smallest model is substantially reduced to 23.8 % of the largest model’s inference time while the ACC barely changes. It is notable that the inference time drops even more when removing ResBlocks for F = 15 on top of the feature map reduction. Moreover, Figure 3 shows the absolute motor step errors depending on the absolute motor step distances of the labels along each dimension. With larger motor step distances that need to be compensated, the error would be expected to increase. The plot shows that this increase is only minor for a large portion of the motor step range. It can be used as an indication of the motion magnitude that can be expected to have good tracking performance. Assuming a rough calibration factor of ∼2.5 for lateral motor steps to voxels and factor of ∼190 for the axial direction, our method qualitatively achieves sub-voxel accuracy. Considering the volume size of 3 mm × 3 mm × 3.5 mm with a resolution of 32 × 32 × 32 voxels, the absolute errors are well below 100 µm.
4. DISCUSSION AND CONCLUSION We propose a new deep learning-based method for OCT-based motion compensation. In particular, we avoid time-consuming and inaccurate volume-based registration with a subsequent hand-eye calibration by directly learning the calibration between marker object movement observed in 3D volumes and motors which move the scan area. For this purpose, we use a two-path 3D CNN architecture that predicts the required motor steps for motion compensation of an object’s movement based on two input volumes. Considering the results in Table 1, the very high average correlation coefficient shows that the learning problem is well solved. Also, the absolute errors show that we qualitatively achieve sub-voxel accuracy which translates to errors well below 100 µm. With Figure 3 we can show that even large distances to be compensated only lead to a minor increase in error. Thus, our model should be robust even towards rapid and large motion. As we reattached the object several times in between dataset acquisition, the results indicate that the model is capable of learning to track the object. Thus, the object can be used as marker in a region-of-interest that should be tracked. When using a more efficient, downscaled architecture, performance is slightly reduced as a trade-off for faster inference. With 1.79 ms the model achieves a processing frequency of 559 volumes per second which is among the same magnitude as the OCT’s acquisition frequency. This shows that our model does not constitute a bottleneck in the entire compensation process despite having to perform volumetric data processing. For future work, our calibration
strategy could be extended by using different marker objects in order to achive generalization to new marker types. Also, the more challenging motion compensation task of markerless tissue tracking could be addressed.
REFERENCES  Lankenau, E., Klinger, D., Winter, C., Malik, A., M¨ uller, H. H., Oelckers, S., Pau, H.-W., Just, T., and H¨ uttmann, G., “Combining optical coherence tomography (OCT) with an operating microscope,” in [Advances in Medical Engineering], 343–348, Springer (2007).  Ehlers, J. P., Srivastava, S. K., Feiler, D., Noonan, A. I., Rollins, A. M., and Tao, Y. K., “Integrative advances for OCT-guided ophthalmic surgery and intraoperative OCT: microscope integration, surgical instrumentation, and heads-up display surgeon feedback,” PloS One 9(8), e105224 (2014).  Finke, M., Kantelhardt, S., Schlaefer, A., Bruder, R., Lankenau, E., Giese, A., and Schweikard, A., “Automatic scanning of large tissue areas in neurosurgery using optical coherence tomography,” The International Journal of Medical Robotics and Computer Assisted Surgery 8(3), 327–336 (2012).  Tao, Y. K., Srivastava, S. K., and Ehlers, J. P., “Microscope-integrated intraoperative OCT with electrically tunable focus and heads-up display for imaging of ophthalmic surgical maneuvers,” Biomed Opt Express 5(6), 1877–1885 (2014).  Novais, E. A., Adhi, M., Moult, E. M., Louzada, R. N., Cole, E. D., Husvogt, L., Lee, B., Dang, S., Regatieri, C. V., Witkin, A. J., Baumal, C. R., Hornegger, J., Jayaraman, V., Fujimoto, J. G., Duker, J. S., and Waheed, N. K., “Choroidal neovascularization analyzed on ultrahigh-speed swept-source optical coherence tomography angiography compared to spectral-domain optical coherence tomography angiography,” American Journal of Ophthalmology 164, 80 – 88 (2016).  Siddiqui, M., Nam, A. S., Tozburun, S., Lippok, N., Blatter, C., and Vakoc, B. J., “High-speed optical coherence tomography by circular interferometric ranging,” Nature Photonics 12(2), 111 (2018).  Laves, M.-H., Schoob, A., Kahrs, L. A., Pfeiffer, T., Huber, R., and Ortmaier, T., “Feature tracking for automated volume of interest stabilization on 4D-OCT images,” in [SPIE Medical Imaging], 101350W– 101350W (2017).  Zhang, Y. and W¨ orn, H., “Optical coherence tomography as highly accurate optical tracking system,” in [IEEE/ASME International Conference on Advanced Intelligent Mechatronics, 2014 ], 1145–1150, IEEE, Piscataway, NJ (2014).  Gessert, N., Schl¨ uter, M., and Schlaefer, A., “A deep learning approach for pose estimation from volumetric oct data,” Medical image analysis 46, 162–179 (2018).  Rajput, O., Antoni, S.-T., Otte, C., Saathoff, T., Matth¨aus, L., and Schlaefer, A., “High accuracy 3D data acquisition using co-registered OCT and kinect,” in [IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems], 32–37 (2016).  Niemeijer, M., Garvin, M. K., Lee, K., van Ginneken, B., Abr`amoff, M. D., and Sonka, M., “Registration of 3d spectral oct volumes using 3d sift feature point matching,” in [Medical Imaging: Image Processing ], 7259, 72591I (2009).  Gessert, N., Beringhoff, J., Otte, C., and Schlaefer, A., “Force estimation from OCT volumes using 3D CNNs,” International journal of computer assisted radiology and surgery 13(7), 1073–1082 (2018).  He, K., Zhang, X., Ren, S., and Sun, J., “Identity mappings in deep residual networks,” in [ECCV ], 630–645 (2016).  He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [IEEE Conference on Computer Vision and Pattern Recognition ], 770–778 (2016).  Kingma, D. and Ba, J., “Adam: A method for stochastic optimization,” in [ICLR], (2014).  Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., and Devin, M., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467 (2016).