If I have a CPU with two cores, each core has it's own L1 cache, is it possible that Core1 and Core2 caches a same part of memory at the same time?
Yes. Performance would be terrible if this wasn't the case. Consider two threads running the same code. You want that code in both L1 caches.
If it is possible, what the value of the main memory will be if both Core1 and Core2 has edited their value in cache?
The old value will be in main memory, which won't matter since neither CPU will read it. Before ejecting a modified value from cache, it must be written to memory. Typically some variant of the MESI protocol is used. In the traditional implementation of MESI, if a value is modified in one cache, it cannot be present at all in any other cache at that same level.
Yes, A processor is a generic term used to describe any sort of CPU, regardless of cores. Same goes for CPU, it does not imply single or multi-core and can be used to refer to either.
A central processing unit (CPU) is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions. The term has been used in the computer industry at least since the early 1960s. Traditionally, the term "CPU" refers to a processor, more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry.
Processing performance of computers is increased by using multi-core processors, which essentially is plugging two or more individual processors (called cores in this sense) into one integrated circuit. Ideally, a dual core processor would be nearly twice as powerful as a single core processor. In practice, the performance gain is far smaller, only about 50%, due to imperfect software algorithms and implementation. Increasing the number of cores in a processor (i.e. dual-core, quad-core, etc.) increases the workload that can be handled. This means that the processor can now handle numerous asynchronous events, interrupts, etc. which can take a toll on the CPU when overwhelmed. These cores can be thought of as different floors in a processing plant, with each floor handling a different task. Sometimes, these cores will handle the same tasks as cores adjacent to them if a single core is not enough to handle the information.
Due to specific capabilities of modern CPUs, such as hyper-threading and uncore, which involve sharing of actual CPU resources while aiming at increased utilization, monitoring performance levels and hardware utilization gradually became a more complex task.
Multi-processor systems are different however. This refers to a computer with a motherboard that supports more than 1 processor (usually 2 to 8 CPUs, but some super computers use special hardware that allows for many more to be used on a single motherboard). Here is a catch, multi-processor computers can (and usually do) use multi-core CPUs. For example, I have built several multi-processor servers that had two Intel Xeon 5560 Quad Core CPUs. This particular CPU offers a technology known as hyper threading. Hyper threading is a technology that virtually splits the 4 (quad) cores into halves, which effectively gives you a total of 8 cores per CPU. Since we have 8 cores per CPU with hyper-threading, and the system is multi-processor - the end result is a system with 16 cores. Each core can process a thread independently of the other cores, which means you have a lot more power to process information than you would with a single CPU.
Best Answer
To answer directly modern x86 CPUs are indeed superscalar and capable of fetching, scheduling and executing multiple instructions per clock cycle.
As a slightly extreme example, a modern i7 6950X core is apparently capable of 10.6 instructions per clock cycle (per core) when performing the Dhrystone MIPS benchmark, most likely due to instruction fusion and other smart features in and around the core making it more efficient than a simple 1:1 instruction stream.
The front end of the CPU handles instruction decoding and passes on uOPs (broken down or even fused instructions) to the execution engine which then routes and dispatches instructions to the various units capable of handling different instruction types.
In a Skylake CPU there are multiple units capable of doing integer arithmetic and logic (INT ALU) and also vector processing as well as FP math. In theory an instruction could be dispatched to each one of those units at the same time for execution, but generally there is a limit on how many uOPs can be dispatched at once and to what units.
There is also the problem of instructions having different timings and not all processing units becoming available at the same time.
As to registers, internally the CPU can remap and replace the registers used by a program to better suit the actual execution units. In the image below you see that Skylake has over 300 registers; 180 integer and 168 vector registers. These will be used as required.
Wikichip is an awesome place to find out more about CPU architecture in general. Below is an image showing the functional blocks in a Skylake CPU core.
You cannot dispatch two instructions to the same port in one clock cycle, but instruction can be queued per port or allocated to another port for execution as long as it is capable of executing that instruction type.