ABSTRACT
Many DTM schemes rely heavily on the accurate knowledge
of the chip’s dynamic thermal state to make optimal performance/
temperature trade-off decisions.
1. INTRODUCTION
Most of today’s high performance multi-core processors
suffer from the heavy power/thermal stress
(page 1 col 2)
If the statistical characteristics of the power dissipation
profile does not change in time, Kalman filter based approach
can generate optimal thermal estimates using sensor observations
[4].
In this paper, we investigate the problem of
adaptive temperature tracking at runtime by considering the
dynamic changes in the statistical characteristics of the power
profile.
2. PRELIMINARY
2.1 System Dynamics
2.2 Kalman Filter Based Thermal Tracking
3. PROBLEM DEFINITION AND
CHALLENGES
The statistical characteristics of each potential power state
could be captured by simulating or experimenting with all
potential applications sets (integer vs floating point, scientific
vs multimedia and etc.)
4. ADAPTIVE TRACKING BASED ON
RESIDUAL WHITENING
4.1 Autonomous Detection
In this section we explain how Kalman filter can be used
to autonomously detect the switch of the power states.
Note that we use Cs instead of C[n|n−1] since we assume
the system has reached the steady state.
Once again we use steady state Ks as parameter.
(p4)
Basically, we evaluate error between observation and prediction
and estimate the autocorrelation.
Wednesday, September 1, 2010
Friday, August 20, 2010
paper 5.2 Consistent Runtime Thermal Prediction and Control Through Workload Phase Detection
ABSTRACT
Elevated temperatures impact the performance, power consumption, and reliability of processors, which rely on integrated thermal sensors to measure runtime thermal behavior.
1. INTRODUCTION
Dynamic thermal management (DTM) techniques allow processors to optimize performance while avoiding thermal violations. The most well-known DTM techniques include clock gating, dynamic
voltage and frequency scaling (DVFS), and thread migration/scheduling [14, 18, 6, 5].
2. RELATED WORK AND MOTIVATION
3. PROPOSED PHASE-AWARE THERMAL PREDICTION METHODOLOGY
At the highest level, our phase-aware thermal prediction approach
takes raw performance counter data that is periodically measured
for each core during workload operation and translates this data
into a temperature projection for some interval into the future using
the concept of workload phases.
In order to define workload phases and capture temperature dynamics
within them in a computationally efficient manner, we propose
the methodology that is illustrated by Figure 2.
3.1 Offline Thermal Phase Analysis
In order to avoid excessive runtime overhead, global phase analysis
and within-phase temperature modeling are performed offline
using data generated for a set of representative workloads.
(page 3, col 2)
page 4
3.2 Runtime Thermal Prediction and Control
4. EXPERIMENTAL RESULTS
A. Experimental Infrastructure.
(page 5)
5. CONCLUSIONS
Elevated temperatures impact the performance, power consumption, and reliability of processors, which rely on integrated thermal sensors to measure runtime thermal behavior.
1. INTRODUCTION
Dynamic thermal management (DTM) techniques allow processors to optimize performance while avoiding thermal violations. The most well-known DTM techniques include clock gating, dynamic
voltage and frequency scaling (DVFS), and thread migration/scheduling [14, 18, 6, 5].
2. RELATED WORK AND MOTIVATION
3. PROPOSED PHASE-AWARE THERMAL PREDICTION METHODOLOGY
At the highest level, our phase-aware thermal prediction approach
takes raw performance counter data that is periodically measured
for each core during workload operation and translates this data
into a temperature projection for some interval into the future using
the concept of workload phases.
In order to define workload phases and capture temperature dynamics
within them in a computationally efficient manner, we propose
the methodology that is illustrated by Figure 2.
3.1 Offline Thermal Phase Analysis
In order to avoid excessive runtime overhead, global phase analysis
and within-phase temperature modeling are performed offline
using data generated for a set of representative workloads.
(page 3, col 2)
page 4
3.2 Runtime Thermal Prediction and Control
4. EXPERIMENTAL RESULTS
A. Experimental Infrastructure.
(page 5)
5. CONCLUSIONS
Wednesday, August 18, 2010
paper 5.1 Thermal Monitoring of Real Processors: Techniques for Sensor Allocation and Full Characterization
ABSTRACT
1. INTRODUCTION
2. BACKGROUND AND PREVIOUS WORK
3. FREQUENCY DOMAIN TECHNIQUES
4. PROPOSED THERMAL SENSOR ALLOCATION TECHNIQUES
5. PROPOSED FULL RUNTIME THERMAL CHARACTERIZATION TECHNIQUES
A. k-LSE using Pre-determined Thermal Characterization
B. Compressive Sensing
6. EXPERIMENTAL RESULTS
7. CONCLUSIONS
1. INTRODUCTION
2. BACKGROUND AND PREVIOUS WORK
3. FREQUENCY DOMAIN TECHNIQUES
4. PROPOSED THERMAL SENSOR ALLOCATION TECHNIQUES
5. PROPOSED FULL RUNTIME THERMAL CHARACTERIZATION TECHNIQUES
A. k-LSE using Pre-determined Thermal Characterization
B. Compressive Sensing
6. EXPERIMENTAL RESULTS
7. CONCLUSIONS
Tuesday, August 17, 2010
paper 4.4 An Effective GPU Implementation of Breadth-First Search
ABSTRACT
1. INTRODUCTION
2. PREVIOUS APPROACHES
3. OUR GPU SOLUTION
3.1 Overview of CUDA on the Nvidia GTX280
3.2 Hierarchical Queue Management
3.3 Hierarchical Kernel Arrangement
4. EXPERIMENTAL RESULTS
1. INTRODUCTION
2. PREVIOUS APPROACHES
3. OUR GPU SOLUTION
3.1 Overview of CUDA on the Nvidia GTX280
3.2 Hierarchical Queue Management
3.3 Hierarchical Kernel Arrangement
4. EXPERIMENTAL RESULTS
Friday, August 13, 2010
paper 4.3 Timing Analysis of Esterel Programs on General-purpose Multiprocessors Lei
ABSTRACT
1. INTRODUCTION
2. OVERVIEW OF ESTEREL
3. CODE GENERATION
4. TIMING ANALYSIS
4.1 Computing Start Times
4.2 Inter-processor Infeasible Paths
4.3 WCET Calculation of a Basic Block
4.4 WCRT Analysis
5. EXPERIMENTAL RESULTS
1. INTRODUCTION
2. OVERVIEW OF ESTEREL
3. CODE GENERATION
4. TIMING ANALYSIS
4.1 Computing Start Times
4.2 Inter-processor Infeasible Paths
4.3 WCET Calculation of a Basic Block
4.4 WCRT Analysis
5. EXPERIMENTAL RESULTS
Tuesday, August 10, 2010
paper 4.2 A Probabilistic and Energy-Efficient Scheduling Approach for Online Application in Real-Time Systems
ABSTRACT
1. INTRODUCTION
2. PRELIMINARIES
2.1 System and Task Model
2.2 Motivating Example
2.3 General Scheduling Concept
2.4 Expected Energy and Time Demand
3. ENERGY MINIMIZATION PROBLEM
4. SOLUTION
4.1 Relaxed Energy Minimization Problem
4.2 General Energy Minimization Problem
4.3 Implementation
5. EXPERIMENTS
5.1 Experimental Setup
5.2 Results
6. CONCLUSIONS
1. INTRODUCTION
2. PRELIMINARIES
2.1 System and Task Model
2.2 Motivating Example
2.3 General Scheduling Concept
2.4 Expected Energy and Time Demand
3. ENERGY MINIMIZATION PROBLEM
4. SOLUTION
4.1 Relaxed Energy Minimization Problem
4.2 General Energy Minimization Problem
4.3 Implementation
5. EXPERIMENTS
5.1 Experimental Setup
5.2 Results
6. CONCLUSIONS
Friday, August 6, 2010
Paper 4.1 LATA: A Latency and Throughput-Aware Packet Processing System
ABSTRACT
1. INTRODUCTION
2. LATA SYSTEM DESIGN
2.1 Program Representation
2.2 Communication Measurement
2.3 Problem Statement
2.4 DAG Generation
3. LATA SCHEDULING, REFINEMENT AND MAPPING
3.1 List-based Pipeline Scheduling Algorithm
3.2 Search-based Refinement Process
3.2.1 Latency Reduction
3.2.2 Throughput Improvement
3.3 Cache-Aware Resource Mapping
3.3.1 Pre-mapping
3.3.2 Real Mapping
4. EXPERIMENT FRAMEWORK
5. PERFORMANCE EVALUATION
5.1 Comparison with Parallel System
5.2 Comparison with Three NP Systems
5.3 Latency Constraint Effect
5.4 Scalability Performance of LATA
5.5 Instruction Cache Size Performance
1. INTRODUCTION
2. LATA SYSTEM DESIGN
2.1 Program Representation
2.2 Communication Measurement
2.3 Problem Statement
2.4 DAG Generation
3. LATA SCHEDULING, REFINEMENT AND MAPPING
3.1 List-based Pipeline Scheduling Algorithm
3.2 Search-based Refinement Process
3.2.1 Latency Reduction
3.2.2 Throughput Improvement
3.3 Cache-Aware Resource Mapping
3.3.1 Pre-mapping
3.3.2 Real Mapping
4. EXPERIMENT FRAMEWORK
5. PERFORMANCE EVALUATION
5.1 Comparison with Parallel System
5.2 Comparison with Three NP Systems
5.3 Latency Constraint Effect
5.4 Scalability Performance of LATA
5.5 Instruction Cache Size Performance
Tuesday, August 3, 2010
Paper 3.3 Online SystemC Emulation Acceleration
Abstract
1. INTRODUCTION
2. RELATED WORK
3. SYSTEMC IN-SYSTEM EMULATION ARCHITECTURE
3.1 Base Architecture with Acceleration Engines
3.2 Kernel Bypass
4. ONLINE ACCELERATION ASSIGNMENT
4.1 Problem Definition
4.2 Communication Overhead
5. HEURISTICS
5.1 Upper and Lower Bounds
5.2 Accelerator Static Assignment
5.3 Greedy Heuristic
5.4 Aggregate Gain
6. EXPERIMENTS
6.1 Framework
6.2 Evaluation
1. INTRODUCTION
2. RELATED WORK
3. SYSTEMC IN-SYSTEM EMULATION ARCHITECTURE
3.1 Base Architecture with Acceleration Engines
3.2 Kernel Bypass
4. ONLINE ACCELERATION ASSIGNMENT
4.1 Problem Definition
4.2 Communication Overhead
5. HEURISTICS
5.1 Upper and Lower Bounds
5.2 Accelerator Static Assignment
5.3 Greedy Heuristic
5.4 Aggregate Gain
6. EXPERIMENTS
6.1 Framework
6.2 Evaluation
Saturday, July 31, 2010
Paper 3.2 Abstraction of RTL IPs into Embedded Software
ABSTRACT
1. INTRODUCTION
2. SW CODE GENERATION ALGORITHM
2.1 EFSM generation
2.2 Merge of processes
2.3 Abstraction of HDL scheduler
It is like sitting the presentation of other team's work.
2.4 Definition of interface and communication protocol
3. EXPERIMENTAL RESULTS
4. CONCLUSIONS
1. INTRODUCTION
2. SW CODE GENERATION ALGORITHM
2.1 EFSM generation
2.2 Merge of processes
2.3 Abstraction of HDL scheduler
It is like sitting the presentation of other team's work.
2.4 Definition of interface and communication protocol
3. EXPERIMENTAL RESULTS
4. CONCLUSIONS
Thursday, July 29, 2010
Paper 3.1 A Mixed-Mode Vector-Based Dataflow Approach for Modeling and Simulating LTE Physical Layer
ABSTRACT
Long Term Evolution (LTE)
synchronous dataflow (SDF)
Mixed mode Vector-based Dataflow (MVDF)
1. INTRODUCTION
In SDF, a system is represented as a dataflow graph consisting of functional models.
The effort in using SDF/TSDF to develop physical layer reference designs, however, encounters fundamental limitations due to the constant rate constraint in SDF semantics.
2. BACKGROUND
3. RELATED WORK
4. LTE PHYSICAL LAYER ANALYSIS
In this section, we analyze LTE physical layer from dataflow point of view.
A transport block, denoted as a = a0, a1, . . . , aA−1, where A is the transport block size, is a sequence of bits to be transmitted in the shared channel to a user or a set of users.
5. MIXED-MODE VECTOR-BASED DATAFLOW
6. SIMULATION RESULTS
Long Term Evolution (LTE)
synchronous dataflow (SDF)
Mixed mode Vector-based Dataflow (MVDF)
1. INTRODUCTION
In SDF, a system is represented as a dataflow graph consisting of functional models.
The effort in using SDF/TSDF to develop physical layer reference designs, however, encounters fundamental limitations due to the constant rate constraint in SDF semantics.
2. BACKGROUND
3. RELATED WORK
4. LTE PHYSICAL LAYER ANALYSIS
In this section, we analyze LTE physical layer from dataflow point of view.
A transport block, denoted as a = a0, a1, . . . , aA−1, where A is the transport block size, is a sequence of bits to be transmitted in the shared channel to a user or a set of users.
5. MIXED-MODE VECTOR-BASED DATAFLOW
6. SIMULATION RESULTS
Friday, July 23, 2010
2.1 Post-silicon Validation Challenges: How EDA and Academia Can Help
1. INTRODUCTION
Post-Silicon validation of large microprocessor designs entails testing of components in a system setting.
It involves multiple different aspects, such as logic validation and debug, electrical validation and debug, and debugging software and customer issues.
The disciplines of post-silicon validation include System Validation (SV), Compatibility Validation (CV), and Electrical Validation (EV)
2. CHALLENGES OF POST-SILICON VALIDATION
In addition, JTAG ports, used to control many of the DFV hooks on the die, are too slow for real-time control and synchronization of events.
(calling for new ways of controlling DFV hooks)
3. SOLUTION VECTORS
3.1 Pre-silicon Engagement
3.2 Post-silicon Opportunities
3.3 Survivability (Post-silicon Debug and Infield Repair)
To assist with debug of the issues in the post-silicon phase and to survive issues in the field, designs need to implement comprehensive survivability features.
3.4 EDA and Research Opportunities
Post-Silicon validation of large microprocessor designs entails testing of components in a system setting.
It involves multiple different aspects, such as logic validation and debug, electrical validation and debug, and debugging software and customer issues.
The disciplines of post-silicon validation include System Validation (SV), Compatibility Validation (CV), and Electrical Validation (EV)
2. CHALLENGES OF POST-SILICON VALIDATION
In addition, JTAG ports, used to control many of the DFV hooks on the die, are too slow for real-time control and synchronization of events.
(calling for new ways of controlling DFV hooks)
3. SOLUTION VECTORS
3.1 Pre-silicon Engagement
3.2 Post-silicon Opportunities
3.3 Survivability (Post-silicon Debug and Infield Repair)
To assist with debug of the issues in the post-silicon phase and to survive issues in the field, designs need to implement comprehensive survivability features.
3.4 EDA and Research Opportunities
1 EDA Challenges and Options: Investing for the Future
"Overall, it is likely that a set of design cost driven challenges such as design productivity, power management, design for manufacturability, signal integrity, and reliability, will continue to dominate the EDA roadmaps."
Silicon technology complexity
driven challenges can be generally classified as:
--Leakage, power management, circuit/device
innovation, current delivery;
--Signal integrity analysis and management,
--Manufacturing variability
--Manufacturing handoff, NRE cost
--Scaling of global interconnect performance
relative to device performance.
--Decreased reliability ,electro-migration, SER,
fault-tolerance.
Challenges driven by system complexity [1] can be generally classified as:
--Reuse, hierarchical design, heterogeneous SOC
integration especially for mixed-signal
--Verification and test
--Cost-driven design optimization, co-optimization
at die-package-system levels
--Embedded software design, co-design with
hardware and for networked system
environments
-- Reliable implementation platforms, chip
implementation onto multiple circuit fabrics,
higher-level handoff to implementation
-- Design process management—design team size
and geographic distribution, data management,
collaborative design support.
Silicon technology complexity
driven challenges can be generally classified as:
--Leakage, power management, circuit/device
innovation, current delivery;
--Signal integrity analysis and management,
--Manufacturing variability
--Manufacturing handoff, NRE cost
--Scaling of global interconnect performance
relative to device performance.
--Decreased reliability ,electro-migration, SER,
fault-tolerance.
Challenges driven by system complexity [1] can be generally classified as:
--Reuse, hierarchical design, heterogeneous SOC
integration especially for mixed-signal
--Verification and test
--Cost-driven design optimization, co-optimization
at die-package-system levels
--Embedded software design, co-design with
hardware and for networked system
environments
-- Reliable implementation platforms, chip
implementation onto multiple circuit fabrics,
higher-level handoff to implementation
-- Design process management—design team size
and geographic distribution, data management,
collaborative design support.
Subscribe to:
Comments (Atom)